On Thu 16. juli 2009 12:20:41, plyn wrote:
Show quoted text> On Tue Jan 27 12:13:58 2009, gnudist wrote:
> > Fixed test case. Looks like "—" thing is the reason of this
bug!
Show quoted text>
> I am still seeing "broken" UTF-8. Or, more specifically Double Encoded
> UTF-8.
>
> In the attached example, there are two UTF-8 3 byte characters, and
they
Show quoted text> both turn into 6 byte characters on return.
>
> Original: E2 80 99 (RIGHT SINGLE QUOTATION MARK)
> Returns as: C3 A2 C2 80 C2 99
>
> Original: E2 80 9D (RIGHT SINGLE QUOTATION MARK)
> Returns as: C3 A2 C2 80 C2 9D
>
Easily confirmed:
$ perl -wle 'use utf8; use HTML::Strip; my $str = "←↓→"; print
"utf8_flag: " . utf8::is_utf8($str); my $str2 = HTML::Strip->new()-
Show quoted text>parse($str); print "utf8_flag: " . utf8::is_utf8($str2);'
utf8_flag: 1
utf8_flag:
Work around for real code:
use Encode;
use utf8;
use HTML::Strip;
my $str = "←↓→";
my $utf8_was_on = Encode::is_utf8($str);
my $str2 = HTML::Strip->new()->parse($str);
$utf8_was_on && ($HTML::Strip::VERSION <= 1.06) && Encode::_utf8_on
($str2);