Skip Menu |

This queue is for tickets about the HTML-HTML5-Parser CPAN distribution.

Report information
The Basics
Id: 88041
Status: open
Priority: 0/
Queue: HTML-HTML5-Parser

People
Owner: Nobody in particular
Requestors: NGLENN [...] cpan.org
Cc: garfieldnate [...] gmail.com
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



CC: garfieldnate [...] gmail.com
Subject: make default namespace optional
In HTML5, the http://www.w3.org/1999/xhtml namespace is completely optional. In the browser, document.evaluate() does not require XPaths to explicitly use this namespace, so using default namespaces is fine. By setting the default namespace on the returned document, this module requires any Perl-side XPath querying to explicitly define and use the xhtml namespace, which requires the creation of an XpathContext object. Since I'm working with XPath both in browsers and in Perl, I like to be able to use the same expressions in both, which isn't possible when the default namespace is added. Removing namespaces is complicated because the document must be traversed to remove the namespace from all elements. Therefore, I'd like it if the user could specify "namespace => 0" in the constructors (parse_string parse_file parse_fh) to prevent the parser from setting the default namespace.
As per the HTML spec, whether or not an xmlns attribute is present in the HTML, HTML elements are always in the XHTML namespace. The reason document.evaluate() works without any prefixes, is because it doesn't strictly follow XPath 1.0 - it uses a variation that interprets ":" differently. See http://www.w3.org/html/wg/drafts/html/master/dom.html#interactions-with-xpath-and-xslt So the ideal place to patch things would be the XPath implementation you're using, to add an option to implement HTML5's variation of XPath 1.0. That said, I'm not going to hold my breath on that one happening, so I'll happily add this option. However, in the interests of simplicity, it will probably just be implemented internally by parsing the document as normal and then afterwards crawling the tree to adjust namespaces.
RT-Send-CC: garfieldnate [...] gmail.com
Thanks! The pointer to the documentation on it was very useful, too. Show quoted text
> That said, I'm not going to hold my breath on that one happening,
Me neither (XML::LibXML and LibXML). Show quoted text
>However, in the interests of simplicity, > it will probably just be implemented internally by parsing the > document as normal and then afterwards crawling the tree to adjust > namespaces.
That's what I am doing right now as a workaround.