Skip Menu |

This queue is for tickets about the HTML-HTML5-Parser CPAN distribution.

Report information
The Basics
Id: 88602
Status: rejected
Priority: 0/
Queue: HTML-HTML5-Parser

People
Owner: Nobody in particular
Requestors: NGLENN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.301
Fixed in: (no value)



Subject: resulting document prints with stray end tags
The following results in an invalid document with stray end tags: use HTML::HTML5::Parser; my $parser = HTML::HTML5::Parser->new; my $doc = $parser->parse_string(<<'EOT'); <!DOCTYPE html> <html> <head> <title>Thing</title> <meta charset="utf-8"> <link rel="its-rules" href="blah.html"> </head> <body></body> </html> EOT print $doc->toStringHTML; Result: <html xmlns="http://www.w3.org/1999/xhtml"><head> <title>Thing</title> <meta charset="utf-8"></meta> <link rel="its-rules" href="blah.html"></link> </head> <body> Whereas just with LibXML it does the right thing: use XML::LibXML; my $doc = XML::LibXML->load_html(string => <<'EOT'); <!DOCTYPE html> <html> <head> <title>Thing</title> <meta charset="utf-8"> <link rel="its-rules" href="blah.html"> </head> <body></body> </html> EOT print $doc->toStringHTML; Result: <!DOCTYPE html> <html> <head> <title>Thing</title> <meta charset="utf-8"> <link rel="its-rules" href="blah.html"> </head> <body></body> </html> The thing that's printed by the document returned by HTML::HTML5::Parser does not validate, having stray </meta> and </link> end tags, while the document returned by XML::LibXML does validate.
toStringHTML is an XML::LibXML method and I doubt there's much that HTML::HTML5::Parser can do to change its output. Have you looked at HTML::HTML5::Writer, which is a companion module for HTML::HTML5::Parser? (Combine it with XML::LibXML::PrettyPrint too, and you can end up with some pretty gorgeous looking output.)
Thanks! The HTML::HTML5::Writer module is very useful. I hope this ticket will be useful just for reference in the future. Sometimes the behavior of XML::LibXML differs rather magically (to me).
I don't think there is actually a bug here, so I'm closing this issue to help tidy up RT.