Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 18571
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: mjd [...] plover.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 3.1901
Fixed in: (no value)



Subject: More encoding bizarrities
use HTML::TreeBuilder; use Data::Dumper; my $TB = HTML::TreeBuilder->new(); my $html = $TB->parse("This &sim; is a twiddle")->eof->elementify(); for my $pack qw(HTML::Parser HTML::TreeBuilder HTML::Element) { print qq{$pack: ${"$pack\::VERSION"}\n}; } print $html->as_HTML("\x0"); print $html->as_HTML(""); The output is: HTML::Parser: 3.51 HTML::TreeBuilder: 3.13 HTML::Element: 3.16 Wide character in print at /tmp/tb2 line 10. <html><head></head><body>This \342? is a twiddle</body></html> <html><head></head><body>This &sim; is a twiddle</body></html> This can't be right.
On Thu Apr 06 13:10:58 2006, guest wrote: Show quoted text
> <html><head></head><body>This \342? is a twiddle</body></html> > <html><head></head><body>This &sim; is a twiddle</body></html> > > This can't be right.
The difference in the two is that "\0" and "" have differences in length, so HTML::Entities escapes nothing (assuming you have no nulls in your document) and "" uses HTML::Entities' default list of things to escape, of which &sim; is in that included list, assuming a Perl of 5.7 or newer. I'm not sure what I can do about this.
Subject: Re: [rt.cpan.org #18571] More encoding bizarrities
Date: Sat, 11 Nov 2006 23:20:19 -0500
To: bug-HTML-Tree [...] rt.cpan.org
From: Mark Jason Dominus <mjd [...] plover.com>
Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=18571 > > > On Thu Apr 06 13:10:58 2006, guest wrote:
> > <html><head></head><body>This \342? is a twiddle</body></html> > > <html><head></head><body>This &sim; is a twiddle</body></html> > > > > This can't be right.
> > The difference in the two is that "\0" and "" have differences in > length, so HTML::Entities escapes nothing (assuming you have no nulls in > your document) and "" uses HTML::Entities' default list of things to > escape, of which &sim; is in that included list, assuming a Perl of 5.7 > or newer. > > I'm not sure what I can do about this.
What is the problem? Why not something like this: my @escape_chars; if (defined $_[0]) { @escape_chars = split //, shift(); } else { @escape_chars = @default_escape_chars; } And now @escape_chars contains a list of characters to escape. "" now results in an empty list of characters, as it should, but a missing or undefined argument results in the default, as documented.
On Sat Nov 11 23:20:35 2006, mjd@plover.com wrote: Show quoted text
> What is the problem?
The problem was I wasn't thinking about just NOT calling the encode functionality when it shouldn't be encoded, and blaming the whole problem on HTML::Entities. Updated as_html to do the Right thing, as opposed to codifying the wrong thing. 3.23 is on its way to CPAN with this fix. Thank you very much.