Skip Menu |

This queue is for tickets about the XML-SAX CPAN distribution.

Report information
The Basics
Id: 42896
Status: new
Priority: 0/
Queue: XML-SAX

People
Owner: Nobody in particular
Requestors: IKEGAMI [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.96
Fixed in: (no value)



Subject: Fails for UTF-16 files with encoding="UTF-16"
$ od -c sample.xml | head -n 5 0000000 377 376 < \0 ? \0 x \0 m \0 l \0 \0 v \0 0000020 e \0 r \0 s \0 i \0 o \0 n \0 = \0 " \0 0000040 1 \0 . \0 0 \0 " \0 \0 e \0 n \0 c \0 0000060 o \0 d \0 i \0 n \0 g \0 = \0 " \0 U \0 0000100 T \0 F \0 - \0 1 \0 6 \0 " \0 ? \0 > \0 Parsing the above file (via XML::Simple) results in UTF-16:Unrecognised BOM 3f3e at /usr/lib/perl/5.8/Encode.pm line 166. 3F = ? 3E = > Two problems: - It shouldn't look for a BOM at that location. - It's trying to decode what's already been decoded (since the error message doesn't say 3f00)
From: f1i3d0001ic3 [...] gmail.com
On Thu Jan 29 13:24:41 2009, ikegami wrote: Show quoted text
> $ od -c sample.xml | head -n 5 > 0000000 377 376 < \0 ? \0 x \0 m \0 l \0 \0 v \0 > 0000020 e \0 r \0 s \0 i \0 o \0 n \0 = \0 " \0 > 0000040 1 \0 . \0 0 \0 " \0 \0 e \0 n \0 c \0 > 0000060 o \0 d \0 i \0 n \0 g \0 = \0 " \0 U \0 > 0000100 T \0 F \0 - \0 1 \0 6 \0 " \0 ? \0 > \0 > > Parsing the above file (via XML::Simple) results in > > UTF-16:Unrecognised BOM 3f3e at /usr/lib/perl/5.8/Encode.pm line 166. > > 3F = ? > 3E = > > > Two problems: > > - It shouldn't look for a BOM at that location. > > - It's trying to decode what's already been decoded (since the error > message doesn't say 3f00)
This fails for me too, using XML::Simple at 2.18 and XML::SAX at 0.96. Found a workaround for using XML::Simple - just remove XML::SAX in the ppm. I'm using ActiveState perl in Windows 7 64-bit. Somewhat of an annoyance to track this down, and to maintain *not* having this module installed.
On Wed Dec 01 20:00:05 2010, michaelj wrote: Show quoted text
> On Thu Jan 29 13:24:41 2009, ikegami wrote:
> > $ od -c sample.xml | head -n 5 > > 0000000 377 376 < \0 ? \0 x \0 m \0 l \0 \0 v \0 > > 0000020 e \0 r \0 s \0 i \0 o \0 n \0 = \0 " \0 > > 0000040 1 \0 . \0 0 \0 " \0 \0 e \0 n \0 c \0 > > 0000060 o \0 d \0 i \0 n \0 g \0 = \0 " \0 U \0 > > 0000100 T \0 F \0 - \0 1 \0 6 \0 " \0 ? \0 > \0 > > > > Parsing the above file (via XML::Simple) results in > > > > UTF-16:Unrecognised BOM 3f3e at /usr/lib/perl/5.8/Encode.pm line 166. > > > > 3F = ? > > 3E = > > > > > Two problems: > > > > - It shouldn't look for a BOM at that location. > > > > - It's trying to decode what's already been decoded (since the error > > message doesn't say 3f00)
> > > This fails for me too, using XML::Simple at 2.18 and XML::SAX at 0.96. > Found a workaround for using XML::Simple - just remove XML::SAX in the
ppm. Workaround: Set $XML::Simple::PREFERRED_PARSER. For example, the following switches to the fastest existing backend for XML::Simple: local $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; It doesn't support namespaces. For those, you'd need to use one of the XML::SAX parsers such as XML::SAX::ExpatXS. What's giving you the problem is XML::SAX::PurePerl. Just don't use that one. You can configure which parsers XML::SAX will consider. I suggest that you remove XML::SAX::PurePerl from that list.