Subject: | Re: Question on JSON::PP handling of \u00xx |
Date: | Thu, 19 Feb 2009 11:51:58 +0000 |
To: | bug-JSON [...] rt.cpan.org, makamaka [...] cpan.org |
From: | Mika Raento <mikie [...] google.com> |
Ah, that doesn't solve it completely. Setting $is_utf8 will make the
decode call utf8::decode() on the result which turns the nice
character string into bytes. This seems more complicated than I
thought :-(
Mika
On Thu, Feb 19, 2009 at 11:23 AM, Mika Raento <mikie@google.com> wrote:
Show quoted text
> Hiya Makamaka
>
> I'm trying to JSON::PP with non-ascii characters and I find the
> behaviour a bit odd. Characters in the range 127-255 are _not_ made
> into utf-8 / perl characters on decode, instead they come out as bytes
> with those values. This makes it difficult to handle strings with
> those characters as I'd need to go through the results and
> utf8::upgrade() everything.
>
> It's simple to fix - just replace
> if ((my $hex = hex( $u )) > 255) {
> $is_utf8 = 1;
> $s .= JSON_PP_decode_unicode($u) || next;
> }
> with
> if ((my $hex = hex( $u )) > 127) {
> $is_utf8 = 1;
> $s .= JSON_PP_decode_unicode($u) || next;
> }
> JSON/PP.pm around line 804.
>
> However, there are a number of tests that check that we can get bytes
> 128-255 out as-is so it looks like this behaviour was intended, at
> least on some level.
>
> Thoughts?
>
> Thanks,
> Mika Raento
>
> --
> Google UK Limited
> Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1 9TQ
> Registered in England Number: 3977902
>
--
Google UK Limited
Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1 9TQ
Registered in England Number: 3977902