Subject: | UTF-8 handling severly broken |
When $JSON::UTF8 is enabled, the handling of strings fails, if they
contain a non-ascii character that can be encoded in Latin-1 and
another non-ascii character that can not be encoded in Latin-1.
See the example below. \x{f6} is a German umlaut "o" and \x{20ac}
is the Euro currency sign.
The third string below contains a Latin-1-encoded umlaut and a
UTF-8-encoded Euro sign after JSON's treatment. As such the output
is absolutely unusable.
dst@host:~$ perl -MJSON -e 'print $JSON::VERSION."\n"'
1.14
dst@host:~$ perl -v | grep built
This is perl, v5.8.4 built for i386-linux-thread-multi
dst@host:~$ perl -MJSON -MData::Dumper -e '$JSON::UTF8=1; $h =
[ "\x{f6}", "\x{20ac}", "\x{f6}\x{20ac}" ]; $i =
jsonToObj(objToJson($h)); print Dumper($i)'
$VAR1 = [
'ö',
"\x{20ac}",
'öâ¬'
];
The following patch seems to fix it, but I'm not 100% sure, whether
there are
side effects. Encoding the character string $f into a UTF-8-encoded
byte string and attaching it to a Latin-1 string is definitely the
wrong thing
to do here.
--- /usr/share/perl5/JSON/Parser.pm 2007-05-06 06:51:55.000000000
+0200
+++ /home/dst/src/JSON-1.14/lib/JSON/Parser.pm 2007-07-23
16:50:20.000000000 +0200
@@ -122,7 +123,7 @@
$u .= $ch;
}
my $f = chr(hex($u));
- utf8::encode( $f ) if($USE_UTF8 ||
$USE_UnicodeString);
$s .= $f;
}
else{