Subject: | Question on JSON::PP handling of \u00xx |
Date: | Thu, 19 Feb 2009 11:23:16 +0000 |
To: | bug-JSON [...] rt.cpan.org, makamaka [...] cpan.org |
From: | Mika Raento <mikie [...] google.com> |
Hiya Makamaka
I'm trying to JSON::PP with non-ascii characters and I find the
behaviour a bit odd. Characters in the range 127-255 are _not_ made
into utf-8 / perl characters on decode, instead they come out as bytes
with those values. This makes it difficult to handle strings with
those characters as I'd need to go through the results and
utf8::upgrade() everything.
It's simple to fix - just replace
if ((my $hex = hex( $u )) > 255) {
$is_utf8 = 1;
$s .= JSON_PP_decode_unicode($u) || next;
}
with
if ((my $hex = hex( $u )) > 127) {
$is_utf8 = 1;
$s .= JSON_PP_decode_unicode($u) || next;
}
JSON/PP.pm around line 804.
However, there are a number of tests that check that we can get bytes
128-255 out as-is so it looks like this behaviour was intended, at
least on some level.
Thoughts?
Thanks,
Mika Raento
--
Google UK Limited
Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1 9TQ
Registered in England Number: 3977902