Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 86062
Status: open
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: michael.jemmeson [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Duplicate <html> tags with implicit_tags(0) set
Not sure if this is a bug or not, but it seems to be undocumented if not. use HTML::TreeBuilder; my $html = HTML::TreeBuilder->new(); $html->implicit_tags(0); $html->parse_content( "<html> ... </html>" ); print $html->as_HTML; # prints: <html><html> ... </html></html> (where ... is anything from an empty string to full webpage)
At first I was inclined to agree that this is a bug, but after starting to try to fix it, I think it should just be documented. The problem is that I was having difficulty pinning down exactly when the explicit <html> tag should be merged with the implicit one. In your example of '<html>...</html>', it seems obvious it should be merged, but what if you have something before the <html> tag? If you have '<!DOCTYPE html><html>...' it should probably still be merged. If you have '<!--hi--><html>' it should probably be merged, but then you can no longer tell the difference between that and '<html><!--hi-->'. I think it's simpler to just document that implicit_tags(0) always gives you an implicit <html> tag containing whatever explicit content you had (possibly starting with an explicit <html> tag).
On Sun Jul 07 03:06:06 2013, CJM wrote: Show quoted text
> At first I was inclined to agree that this is a bug, but after > starting to try to fix it, I think it should just be documented. > The problem is that I was having difficulty pinning down exactly > when the explicit <html> tag should be merged with the implicit > one. > > In your example of '<html>...</html>', it seems obvious it should be > merged, but what if you have something before the <html> tag? > > If you have '<!DOCTYPE html><html>...' it should probably still be > merged. > > If you have '<!--hi--><html>' it should probably be merged, but then > you can no longer tell the difference between that and '<html><!-- > hi-->'. > > I think it's simpler to just document that implicit_tags(0) always > gives you an implicit <html> tag containing whatever explicit > content you had (possibly starting with an explicit <html> tag).
I think a reasonable interpretation is that implicit_tags(0) would disable all implicit tags, and you wouldn't do any merging. i.e. the bug is that when you set implicit_tags(0) you still get implicit tags, and you should get none. If you don't supply everything you need, then bad luck. Documenting the limitation is of course also a reasonable option if that is more effort than it's worth :) It may be useful in the long run since I think one of the bad things about the current as_HTML is that it doesn't allow you to generate HTML fragments and there are lots of use cases where that would be a cool feature. It would also be super awesome if you could just turn off the implicit html and body tags, but pop in all those implicit tags lower down that make your HTML sane. Ponies ;) Cheers, Jeff.
On Mon, Jul 8, 2013 5:03:53 AM, jfearn wrote: Show quoted text
> the bug is that when you set implicit_tags(0) you still get > implicit tags, and you should get none.
The problem is that you need the implicit <html> tag to contain the possibly multiple tags from the input (e.g. if you parse "<p>1</p><p>2</p>"). That's why I suggest we just document that implicit_tags(0) gives you exactly 1 implicit tag (the root <html> node) and $html->content_list is whatever you gave us to parse (which might include an explicit <html> tag). That's the way it works now, and it avoids special cases like "you get an implicit <html> unless the first tag was <html>".
I was trying to use this module to rewrite certain bits of an HTML page, leaving the rest intact, hence the use of no implicit tags. A thought: could an attribute (e.g. class="_implicit") be added to the root <html>, which could then be ignored by as_HTML if implicit tags are turned off, or is that getting a bit messy? Anyway, thanks for the replies - I have to confess that this issue only came up whilst working on a stupid Acme type module!
On Tue, Aug 20, 2013 4:40:21 AM, mjemmeson wrote: Show quoted text
> A thought: could an attribute (e.g. class="_implicit") be added to the > root <html>, which could then be ignored by as_HTML if implicit tags > are turned off, or is that getting a bit messy?
I don't like that idea at all, but I think content_as_HTML might be a useful method. (Like the DOM's innerHTML, but taking the same parameters as as_HTML.)