Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 58848
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: dwheeler [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Error Processing Broken Context
Date: Fri, 25 Jun 2010 15:22:31 -0700
To: bug-xml-libxml [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
Parsing the attached feed (Perl 5.12.1, XML::LibXML 1.70, libxml2 2.7.7) with this line: perl -MXML::LibXML -e 'XML::LibXML->new->parse_file(shift)' ~/Desktop/thedieline.rss I get this error: Malformed UTF-8 character (fatal) at /usr/local/lib/perl5/site_perl/5.12.1/darwin-thread-multi-2level/XML/LibXML/Error.pm line 217. line 217 is: $context=~s/[^\t]/ /g; If I comment it out, I get the full error, albeit uglily formatted: thedieline.rss:26: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0x26 0x6C 0x64 <snip /> So clearly there's something up with the string that the regex doesn't like but the downside to that is that I'm losing the actual error. I'm not exactly sure what to do about that. In further testing, it appears the the utf8 flag is set on $context, yet it contains invalid utf8. Is XML::LibXML improperly turning on this flag before it is certain that the text is UTF-8? If I turn off the UTF-8 flag, I get a much better error message. So maybe the code should be updated to catch that exception and turn off the utf8 flag and try again? Thanks, David
Download thedieline.rss
application/rss+xml 6.3k

Message body not shown because it is not plain text.

Hi David, On Fri Jun 25 18:22:41 2010, DWHEELER wrote: Show quoted text
> Parsing the attached feed (Perl 5.12.1, XML::LibXML 1.70, libxml2 > 2.7.7) with this line: > > perl -MXML::LibXML -e 'XML::LibXML->new->parse_file(shift)' > ~/Desktop/thedieline.rss > > I get this error: > > Malformed UTF-8 character (fatal) at > /usr/local/lib/perl5/site_perl/5.12.1/darwin-thread-multi- > 2level/XML/LibXML/Error.pm line 217. > > line 217 is: > > $context=~s/[^\t]/ /g; > > If I comment it out, I get the full error, albeit uglily formatted: > > thedieline.rss:26: parser error : Input is not proper UTF-8, indicate > encoding ! > Bytes: 0xC3 0x26 0x6C 0x64 > <snip /> > > So clearly there's something up with the string that the regex doesn't > like but the downside to that is that I'm losing the actual error. > I'm not exactly sure what to do about that. In further testing, it > appears the the utf8 flag is set on $context, yet it contains > invalid utf8. Is XML::LibXML improperly turning on this flag before > it is certain that the text is UTF-8? > > If I turn off the UTF-8 flag, I get a much better error message. So > maybe the code should be updated to catch that exception and turn > off the utf8 flag and try again? > > Thanks, > > David
thanks for the report, this is now fixed in: https://bitbucket.org/shlomif/perl-xml-libxml - Fix https://rt.cpan.org/Ticket/Display.html?id=58848 : - "Malformed UTF-8 character (fatal) at" exception thrown on invalid UTF-8. - Thanks to David E. Wheeler (DWHEELER) for the report. I'll upload it to CPAN soon. Regards, -- Shlomi Fish