Skip Menu |

This queue is for tickets about the XML-Fast CPAN distribution.

Report information
The Basics
Id: 71534
Status: stalled
Priority: 0/
Queue: XML-Fast

People
Owner: Nobody in particular
Requestors: IKEGAMI [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Doesn't support UTF-16
Doesn't support UTF-16le and UTF-16be. $ perl -MEncode -MXML::Fast -e'die if xml2hash(encode("UTF-16le", qq{\x{FEFF}<?xml version="1.0" encoding="UTF-16"?><root>abc</root>}))->{root} ne "abc";' Bad document end, state = LT_OPEN at .../XML/Fast.pm line 28. $ perl -MEncode -MXML::Fast -e'die if xml2hash(encode("UTF-16le", qq{\x{FEFF}<?xml version="1.0" encoding="UTF-16le"?><root>abc</root>}))->{root} ne "abc";' Bad document end, state = LT_OPEN at .../XML/Fast.pm line 28. $ perl -MEncode -MXML::Fast -e'die if xml2hash(encode("UTF-16be", qq{\x{FEFF}<?xml version="1.0" encoding="UTF-16"?><root>abc</root>}))->{root} ne "abc";' Died at -e line 1. $ perl -MEncode -MXML::Fast -e'die if xml2hash(encode("UTF-16be", qq{\x{FEFF}<?xml version="1.0" encoding="UTF-16be"?><root>abc</root>}))->{root} ne "abc";' Died at -e line 1. It surely doesn't support UTF-32 either.
The workaround is to pass the XML through the following function before passing it to xml2hash. use Encode qw( encode decode ); sub recode_utf16 { for ($_[0]) { my $enc; if (/^\xFF\xFE/) { my $xml = encode('UTF-8', decode('UTF-16le', $_, Encode::FB_CROAK | Encode::LEAVE_SRC)); substr($xml, 0, 100) =~ s/^[^>]* encoding="\K[^"]+(?=")/UTF-8/; return $xml; } if (/^\xFE\xFF/) { my $xml = encode('UTF-8', decode('UTF-16be', $_, Encode::FB_CROAK | Encode::LEAVE_SRC)); substr($xml, 0, 100) =~ s/^[^>]* encoding="\K[^"]+(?=")/UTF-8/; return $xml; } return $_; } }
On Fri Oct 07 15:54:12 2011, ikegami wrote: Show quoted text
> Doesn't support UTF-16le and UTF-16be. > It surely doesn't support UTF-32 either.
Ok, I see. For now I'll add BOM handling for utf-8. For parsing utf-16 or utf-32 I need completely another parsing engine. I'll think of quick fix in nearest release and I'll add to development plan normal UTF-16/32 parsers.
On Sun Oct 16 12:06:03 2011, MONS wrote: Show quoted text
> For parsing utf-16 or utf-32 I need completely another parsing engine.
Or you could simply document that it's not supported.
Noted in documentation. Have ho time for improvements Patches are welcome