Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 33250
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: grandpa [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.23
Fixed in: (no value)



Subject: Nested div elements cause contained p elements to migrate outside the div
html of the form: <p><div><p>foo</p></div></p> is parsed into the tree as: <p><div></div></p> <p>foo</p> Versions prior to 3.23 not checked.
Subject: noname.pl
use strict; use warnings; use lib '..'; use HTML::TreeBuilder; print "$HTML::TreeBuilder::VERSION\n"; my $html = <<'HTML'; <html> <head> </head> <body> <p><div><p>foo</p></div></p> </body> </html> HTML my $root = HTML::TreeBuilder->new; $root->parse_content ($html); $root->elementify (); my @renderOptions = (undef, ' ', {}); print $root->as_HTML (@renderOptions); # Sample output follows __DATA__ __DATA__ <html> <head> </head> <body> <p> <div> </div> </p> <p>foo</p> </body> </html>
On Thu Feb 14 04:58:19 2008, GRANDPA wrote: Show quoted text
> html of the form: > > <p><div><p>foo</p></div></p> > > is parsed into the tree as: > > <p><div></div></p> <p>foo</p> > > Versions prior to 3.23 not checked.
The issue can be fixed by adding 'div' to HTML::Tagset::p_closure_barriers. In the HTML::TreeBuilder constructor this could be done by: push @HTML::Tagset::p_closure_barriers, 'div' unless grep {$_ eq 'div'} @HTML::Tagset::p_closure_barriers; or it may be more appropriate to make a suitable change in HTML::Tagset
Subject: Re: [rt.cpan.org #33250] Nested div elements cause contained p elements to migrate outside the div
Date: Thu, 14 Feb 2008 15:14:41 -0600
To: bug-HTML-Tree [...] rt.cpan.org
From: Andy Lester <andy [...] petdance.com>
On Feb 14, 2008, at 1:27 PM, Peter Jaquiery via RT wrote: Show quoted text
> The issue can be fixed by adding 'div' to > HTML::Tagset::p_closure_barriers. > > In the HTML::TreeBuilder constructor this could be done by: > > push @HTML::Tagset::p_closure_barriers, 'div' unless grep {$_ eq > 'div'} @HTML::Tagset::p_closure_barriers;
I can do that. It'll be pretty trivial. -- Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
Subject: Re: [rt.cpan.org #33250] AutoReply: Nested div elements cause contained p elements to migrate outside the div
Date: Wed, 20 Feb 2008 23:17:01 +1300
To: <bug-HTML-Tree [...] rt.cpan.org>
From: "Peter Jaquiery" <peter.jaquiery [...] ihug.co.nz>
On further consideration and following discussion with others it seems likely that the fundamental problem is that div elements ought not nest inside p elements. <p><div><p>foo</p></div></p> should parse as: <p></p><div><p>foo</p></div> with the trailing </p> in the input ignored because the HTML is fundamentally broken. This parse issue can be fixed in the 'ALL HOPE ...' section by changing the implicit ending list to include 'div'. Changing line 414 from if ($tag eq 'p' or to: if ($tag eq 'p' or $tag eq 'div' or would do the trick. Cheers, Peter Jaquiery
I've uplaoded HTML::Tagset 3.20.