Subject: | XML::SAX::PurePerl is unable to handle seemingly valid unicode character |
XML::SAX::PurePerl chokes on a UTF character that every other parser I'm testing passes through
with out issue. Specifically the following warning is generated only in XML::SAX::PurePerl and not
any other tested SAX driver or XML parser:
running test_cases//XML-SAX-PurePerl.t datastore/10-cvwiki-20091027-pages-articles.xml:
utf8 "\xBF" does not map to Unicode at
/opt/local/lib/perl5/site_perl/5.8.9/XML/SAX/PurePerl/Reader/Stream.pm line 37.
The error causes the output of XML::SAX::PurePerl to be invalid compared to the other parsers.
Steps to reproduce: attempt to parse the attached bzipped XML dump file from the Chuvash
language Wikipedia.
Expected results: properly unescaping the "\xBF" UTF value with out generating a warning and
generating proper output.
Subject: | 10-cvwiki-20091027-pages-articles.xml.bz2 |
Message body not shown because it is not plain text.