Bug #43212 for XML-Atom: XML vs. [X]HTML parsing

Subject:	XML vs. [X]HTML parsing
Date:	Wed, 11 Feb 2009 09:35:05 -0800 (PST)
To:	bug-XML-Atom [...] rt.cpan.org
From:	Kevin Vargo <vargok [...] yahoo.com>

Hi, We're using v0.33 of XML::Atom, and noticed that sometimes XHTML fragments will get marked down to escaped <content type="text">. This appears to be the result of LibXML returning an invalid parse of the content, due to   -- valid in XHTML, and not valid in XML. I note that LibXML has a parse_html_string mode that appears do The Right Thing here, but have not verified it in the code. The are of code seems to be in: Content.pm around where the eval{... } and check for LIBXML occurs; $node is returned empty from the parse attempt. Replacing   for   runs through valid as xhtml. Basically, if $node comes back empty from the eval, I the parse again, but via the html method, and it comes in as xhtml what appears to be properly. Something along the lines of the following should work -- once proper error handling has been added: --- /usr/lib/perl5/site_perl/5.8.8/XML/Atom/Content.pm 2009-02-11 12:32:36.000000000 -0500 +++ /home/vargo/tmp/Content.pm-vargo 2010-02-11 12:32:58.000000000 -0500 @@ -63,6 +63,13 @@ if $xp; } }; + + if (! $node) { + my $parser = XML::LibXML->new; + my $tree = $parser->parse_html_string($copy); + $node = $tree->getDocumentElement; + } + if (!$@ && $node) { $elem->appendChild($node); if ($content->version == 0.3) { Thanks, Kevin