Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 36576
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: fujimura [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.63
Fixed in: (no value)



Subject: XML::LibXML->parse_* is not detect BOM.
XML::LibXML->parse_* detect encodings (xml encoding, meta charset= on parse_html_*), but it is not sensitive about BOM. Examples, # $s is broken my $doc = $parser->parse_html_string($bommed); my $s = $doc->findvalue('//title'); # $s is safe string, but only UTF-8, without UTF-16LE/BE (my $unbommed = $bommed) =~ s/^\xEF\xBB\xBF//s; my $doc = $parser->parse_html_string($unbommed); my $s = $doc->findvalue('//title'); Could you fix it?
Upgrade your libxml2 to 2.7.2 and the problem will go away. In the SVN, I added a regression test for this if libxml2 >= 2.7.0 so that if it reappears in the future, we will know. -- Petr