Skip Menu |

This queue is for tickets about the XML-SAX CPAN distribution.

Report information
The Basics
Id: 19367
Status: open
Priority: 0/
Queue: XML-SAX

People
Owner: Nobody in particular
Requestors: jmf [...] liblime.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.14
Fixed in: (no value)



Subject: parse_string() crashes when handed UTF-8 combining characters
When XML::Sax is handed a string to parse that has UTF-8 combining characters in it, it dies with an error: Cannot decode string with wide characters at /usr/local/lib/perl/5.8.4/Encode.pm line 188. I've attached a short script that demonstrates this problem on my Linux box (debian sarge) running kernel 2.6.12-1.1372_FC3, Perl v5.8.5. The application I'm working with, Koha (http://koha.org) is an open-source integrated library automation system (library as in public library), which uses the MARC::File::XML module (which uses XML::SAX) to handle bibliographic records in the MARCXML format. This bug is a major problem for us as we have many users who have records in their system with combining characters. I'm sorry I don't have a patch, I'm still pretty new to SAX and encoding issues in general. Thanks!
Subject: parsercrash.pl
#!/usr/bin/perl use XML::SAX; my $parser = XML::SAX::ParserFactory->parser( Handler => MySAXHandler->new ); binmode STDOUT, ":utf8"; print "\x{65}\x{301}\n"; $parser->parse_string("<xml>\xEF\xBB\xBF\x{65}\x{301}</xml>"); package MySAXHandler; use base qw(XML::SAX::Base); sub start_document { my ($self, $doc) = @_; # process document start event } sub start_element { my ($self, $el) = @_; # process element start event }
From: jmf [...] liblime.com
I installed XML::SAX::Expat and the problem went away. Previously I was using the PurePerl parser and so I'm assuming that's where the problem lies. Cheers, Joshua