Bug #88041 for HTML-HTML5-Parser: make default namespace optional

Thu Aug 22 19:51:28 2013 NGLENN [...] cpan.org - Ticket created

CC:	garfieldnate [...] gmail.com
Subject:	make default namespace optional

In HTML5, the http://www.w3.org/1999/xhtml namespace is completely optional. In the browser, document.evaluate() does not require XPaths to explicitly use this namespace, so using default namespaces is fine. By setting the default namespace on the returned document, this module requires any Perl-side XPath querying to explicitly define and use the xhtml namespace, which requires the creation of an XpathContext object. Since I'm working with XPath both in browsers and in Perl, I like to be able to use the same expressions in both, which isn't possible when the default namespace is added. Removing namespaces is complicated because the document must be traversed to remove the namespace from all elements. Therefore, I'd like it if the user could specify "namespace => 0" in the constructors (parse_string parse_file parse_fh) to prevent the parser from setting the default namespace.

Fri Aug 23 10:14:43 2013 perl [...] toby.ink - Correspondence added

As per the HTML spec, whether or not an xmlns attribute is present in the HTML, HTML elements are always in the XHTML namespace. The reason document.evaluate() works without any prefixes, is because it doesn't strictly follow XPath 1.0 - it uses a variation that interprets ":" differently. See http://www.w3.org/html/wg/drafts/html/master/dom.html#interactions-with-xpath-and-xslt So the ideal place to patch things would be the XPath implementation you're using, to add an option to implement HTML5's variation of XPath 1.0. That said, I'm not going to hold my breath on that one happening, so I'll happily add this option. However, in the interests of simplicity, it will probably just be implemented internally by parsing the document as normal and then afterwards crawling the tree to adjust namespaces.

Fri Aug 23 10:14:43 2013 The RT System itself - Status changed from 'new' to 'open'

Fri Aug 23 12:37:22 2013 NGLENN [...] cpan.org - Correspondence added

RT-Send-CC:

garfieldnate [...] gmail.com

Thanks! The pointer to the documentation on it was very useful, too. Show quoted text

> That said, I'm not going to hold my breath on that one happening,

Me neither (XML::LibXML and LibXML). Show quoted text

>However, in the interests of simplicity, > it will probably just be implemented internally by parsing the > document as normal and then afterwards crawling the tree to adjust > namespaces.

That's what I am doing right now as a workaround.