Subject: | HTML-Tree improperly tagging strings as UTF8 |
Hi,
I give the module an ASCII string via the parse method. I then perform
"as_HTML" and receive a UTF8 string which contains no UTF8 characters.
This is very counter intuitive - I would expect the output string to
be encoded identically as the input string, especially if the resulting
output content is identical to the input content.
thanks,
Kevin Kamel
MailerMailer LLC
Subject: | test.pl |
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use HTML::TreeBuilder;
my $badstring = '<html><head></head><body><span>Text: ف</span></body></html>';
my $parser = HTML::TreeBuilder->new();
$parser->store_comments(1);
$parser->parse($badstring);
my $string = $parser->as_HTML(undef," ",{});
print $string . "\n";
if (utf8::is_utf8($string)) {
print "I AM BROKEN!\n";
}