On Thu Apr 06 12:55:24 2006, guest wrote:
Show quoted text> The content output from this program is not the same as the input. The
> input contains "f" and "ſ". The output has erroneously
> translated this to "&#17f" and "&#383".
There are two things going on here.
One is that HTML::TreeBuilder was erroneously re-encoding entities such
as ſ by escaping &. This has been fixed in 3.22, which will be
released on CPAN this weekend as part of the Chicago Hackathon.
The other, unfixable in HTML::TreeBuilder, is that HTML::Parser
re-encodes both of the above to ſ instead of their original forms.
Since HTML::TreeBuilder's parse method comes from HTML::Parser, this
would have to be changed in the XS for HTML::Parser. However, I'm not
convinced it's a bug, since they're the same entity when decoded.
Will mark as resolved when 3.22 hits CPAN.