Subject: | Attributes of tags get entity-decoded (and even worse, wrongly) when parsed |
Running Debian stable with Perl 5.8.4
I'm parsing this content from a string: <a href="page.pl?id=10&sub=20">
When I print it as_HTML, I get <a href="page.pl?id=10⊂=20">
A semi-colon is mistakenly added after the word 'sub'.
Running the Perl debugger shows that the problem is not in printing stage, but in the parsing. I use HTML::TreeBuilder->new_from_content($string) to parse.
Here's my program:
---------------------------
#!/usr/bin/perl -w
use HTML::TreeBuilder;
my $page = '<a href="page.pl?id=10&sub=20">';
my $p = HTML::TreeBuilder->new_from_content( $page );
# [debug at this stage shows that $p contains a unicode character instead of '&sub']
print $p->as_HTML();
---------------------------
Until this is fixed, is there a way to disable entity-decoding when parsing?