Skip Menu |

This queue is for tickets about the HTML-TreeBuilder-XPath CPAN distribution.

Report information
The Basics
Id: 67298
Status: open
Priority: 0/
Queue: HTML-TreeBuilder-XPath

People
Owner: Nobody in particular
Requestors: JEB [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.12
Fixed in: (no value)



Subject: HTML "section" tag cannot be found
I am looking at a document (http://www.makeuseof.com/tag/roundup-15-free-must-install-programs-for-your-new-pc/) which contains an HTML 5 element "section". I wanted a path that would extract one of these, so I used the XPath expression //section[contains(@class, "post-full")] However this doesnt seemt o match with HTML::TreeBuilder::XPath. I suspect it could be due to an underlying library (HTML::Tree, HTML::ELement??) not knowing about "section"?? Many thanks James
Subject: Re: [rt.cpan.org #67298] HTML "section" tag cannot be found
Date: Thu, 07 Apr 2011 13:37:04 +0200
To: bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
On 04/07/2011 11:54 AM, James Bromberger via RT wrote: Show quoted text
> Thu Apr 07 05:54:13 2011: Request 67298 was acted upon. > Transaction: Ticket created by JEB > Queue: HTML-TreeBuilder-XPath > Subject: HTML "section" tag cannot be found > Broken in: 0.12 > Severity: Normal > Owner: Nobody > Requestors: JEB@cpan.org > Status: new > Ticket<URL: https://rt.cpan.org/Ticket/Display.html?id=67298> > > > I am looking at a document > (http://www.makeuseof.com/tag/roundup-15-free-must-install-programs-for-your-new-pc/) > which contains an HTML 5 element "section". I wanted a path that would > extract one of these, so I used the XPath expression > > //section[contains(@class, "post-full")] > > However this doesnt seemt o match with HTML::TreeBuilder::XPath. > > I suspect it could be due to an underlying library (HTML::Tree, > HTML::ELement??) not knowing about "section"??
The problem is in HTML::Tagset. As section is not defined there as being a proper HTML element, it is not recognized by the parser, and it is silently discarded. I submitted a ticket on HTML::Tagset: https://rt.cpan.org/Ticket/Display.html?id=67299 If I get an answer from the maintainer of the module I will look into a patch, as this seems quite important to me, I know there is an HTML::HTML5 namespace on CPAN, but I would think that most people expect the HTML::* modules to work with whatever HTML is thrown at them. Thanks -- mirod
Subject: Re: [rt.cpan.org #67298] HTML "section" tag cannot be found
Date: Thu, 07 Apr 2011 22:20:18 +0800
To: bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From: James Bromberger <james [...] rcpt.to>
On 7/04/2011 7:35 PM, xmltwig@gmail.com via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=67298 > > > On 04/07/2011 11:54 AM, James Bromberger via RT wrote:
>> Thu Apr 07 05:54:13 2011: Request 67298 was acted upon. >> Transaction: Ticket created by JEB >> Queue: HTML-TreeBuilder-XPath >> Subject: HTML "section" tag cannot be found >> Broken in: 0.12 >> Severity: Normal >> Owner: Nobody >> Requestors: JEB@cpan.org >> Status: new >> Ticket<URL: https://rt.cpan.org/Ticket/Display.html?id=67298> >> >> >> I am looking at a document >> (http://www.makeuseof.com/tag/roundup-15-free-must-install-programs-for-your-new-pc/) >> which contains an HTML 5 element "section". I wanted a path that would >> extract one of these, so I used the XPath expression >> >> //section[contains(@class, "post-full")] >> >> However this doesnt seemt o match with HTML::TreeBuilder::XPath. >> >> I suspect it could be due to an underlying library (HTML::Tree, >> HTML::ELement??) not knowing about "section"??
> The problem is in HTML::Tagset. As section is not defined there as being > a proper HTML element, it is not recognized by the parser, and it is > silently discarded. > > I submitted a ticket on HTML::Tagset: > https://rt.cpan.org/Ticket/Display.html?id=67299 > > If I get an answer from the maintainer of the module I will look into a > patch, as this seems quite important to me, I know there is an > HTML::HTML5 namespace on CPAN, but I would think that most people expect > the HTML::* modules to work with whatever HTML is thrown at them.
Brilliant. Many thanks for looking at this. James -- Mobile: +61 422 166 708, Email: james_AT_rcpt.to