On 04/07/2011 11:54 AM, James Bromberger via RT wrote:
Show quoted text> Thu Apr 07 05:54:13 2011: Request 67298 was acted upon.
> Transaction: Ticket created by JEB
> Queue: HTML-TreeBuilder-XPath
> Subject: HTML "section" tag cannot be found
> Broken in: 0.12
> Severity: Normal
> Owner: Nobody
> Requestors: JEB@cpan.org
> Status: new
> Ticket<URL:
https://rt.cpan.org/Ticket/Display.html?id=67298>
>
>
> I am looking at a document
> (
http://www.makeuseof.com/tag/roundup-15-free-must-install-programs-for-your-new-pc/)
> which contains an HTML 5 element "section". I wanted a path that would
> extract one of these, so I used the XPath expression
>
> //section[contains(@class, "post-full")]
>
> However this doesnt seemt o match with HTML::TreeBuilder::XPath.
>
> I suspect it could be due to an underlying library (HTML::Tree,
> HTML::ELement??) not knowing about "section"??
The problem is in HTML::Tagset. As section is not defined there as being
a proper HTML element, it is not recognized by the parser, and it is
silently discarded.
I submitted a ticket on HTML::Tagset:
https://rt.cpan.org/Ticket/Display.html?id=67299
If I get an answer from the maintainer of the module I will look into a
patch, as this seems quite important to me, I know there is an
HTML::HTML5 namespace on CPAN, but I would think that most people expect
the HTML::* modules to work with whatever HTML is thrown at them.
Thanks
--
mirod