Skip Menu |

This queue is for tickets about the JSON CPAN distribution.

Report information
The Basics
Id: 36623
Status: resolved
Priority: 0/
Queue: JSON

People
Owner: Nobody in particular
Requestors: morten.bjornsvik [...] experian-da.no
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: utf8 conversion does not work
Date: Tue, 10 Jun 2008 16:18:46 +0200
To: <bug-JSON [...] rt.cpan.org>
From: Morten Bjørnsvik <morten.bjornsvik [...] experian-da.no>
Hi I'm unable to get reliable transformation of norwegian characters into JSON and then back into perl I used a Unicode utf8 font for testing. #!/opt/perl/bin/perl -w use Data::Dumper; use JSON; # just a perl test structure my $orig = { desc => 'norwegian characters:', c1 => ['Æ','æ','Ø','ø','Å','å'], c2 => ["ÆæØøÅå"], }; print "Original perl hashref: ", Dumper($orig); my $json1 = to_json($orig, {ascii=>1}); print "json text:", $json1, "\n"; my $perl1 = from_json($json1, {ascii=>1}); print "back to perl hashref:", Dumper($perl1); With ascii we get correct back converting, but the json is broken Original perl hashref: $VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ 'ÆæØøÅå' ], 'c1' => [ 'Æ', 'æ', 'Ø', 'ø', 'Å', 'å' ] }; json text:{"desc":"norwegian characters:","c2":["\u00c3\u0086\u00c3\u00a6\u00c3\u0098\u00c3\u00b8\u00c3\u0085\u00c3\u00a5"],"c1":["\u00c3\u0086","\u00c3\u00a6","\u00c3\u0098","\u00c3\u00b8","\u00c3\u0085","\u00c3\u00a5"]} back to perl hashref:$VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ 'ÆæØøÅå' ], 'c1' => [ 'Æ', 'æ', 'Ø', 'ø', 'Å', 'å' ] }; With utf8=>1 everything is broken: Original perl hashref: $VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ 'ÆæØøÅå' ], 'c1' => [ 'Æ', 'æ', 'Ø', 'ø', 'Å', 'å' ] }; json text:{"desc":"norwegian characters:","c2":["ÃæÃøÃÃ¥"],"c1":["Ã","æ","Ã","ø","Ã","Ã¥"]} back to perl hashref:$VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ "\x{c3}\x{86}\x{c3}\x{a6}\x{c3}\x{98}\x{c3}\x{b8}\x{c3}\x{85}\x{c3}\x{a5}" ], 'c1' => [ "\x{c3}\x{86}", "\x{c3}\x{a6}", "\x{c3}\x{98}", "\x{c3}\x{b8}", "\x{c3}\x{85}", "\x{c3}\x{a5}" ] }; -- Morten Bjørnsvik Experian Decision Analytics AS PB 121, 0102 Oslo, Norway Morten.bjornsvik@experian-da.no <mailto:Morten.bjornsvik@experian-da.no>

Message body is not shown because it is too large.

Show quoted text
>With ascii we get correct back converting, but the json is broken >With utf8=>1 everything is broken:
It is not broken. Those are expected results. I think you want your strings to be treated as Unicode. But you don't use 'utf8' pragma, so they are treated as bytes. With ascii, json is not broken as byte strings are properly escaped. And if you want to convert Unicode string to json, you should use utf8 pragma and using to_json with utf8. use Data::Dumper; use JSON; use utf8; .... To use proper options, see also: http://search.cpan.org/~mlehmann/JSON-XS-2.21/XS.pm#ENCODING/CODESET_FLAG_NOTES
Subject: RE: [rt.cpan.org #36623] utf8 conversion does not work
Date: Wed, 11 Jun 2008 14:48:15 +0200
To: <bug-JSON [...] rt.cpan.org>
From: Morten Bjørnsvik <morten.bjornsvik [...] experian-da.no>
From: Makamaka Hannyaharamitu via RT [mailto:bug-JSON@rt.cpan.org] |<URL: http://rt.cpan.org/Ticket/Display.html?id=36623 > | |>With ascii we get correct back converting, but the json is broken |>With utf8=>1 everything is broken: | |It is not broken. Those are expected results. |I think you want your strings to be treated as Unicode. |But you don't use 'utf8' pragma, so they are treated as bytes. | |With ascii, json is not broken as byte strings are properly escaped. | |And if you want to convert Unicode string to json, you should use |utf8 pragma and using to_json with utf8. | | use Data::Dumper; | use JSON; | use utf8; Hi thanks for the information, I see they are correct, sorry. I see you use ascii 7bit, norwegian characters are within the 8bit ascii range. But the conversion without parameters (latin?) results in escaped utf8 for data dumper even if I use 'no utf8;', this was what I first tried, I assumed that was ascii=>1 Original perl hashref: $VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ 'ÆæØøÅå' ], 'c1' => [ 'Æ', 'æ', 'Ø', 'ø', 'Å', 'å' ] }; json text:{"desc":"norwegian characters:","c2":["ÆæØøÅå"],"c1":["Æ","æ","Ø","ø","Å","å"]} back to perl hashref:$VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ "\x{c6}\x{e6}\x{d8}\x{f8}\x{c5}\x{e5}" ], 'c1' => [ "\x{c6}", "\x{e6}", "\x{d8}", "\x{f8}", "\x{c5}", "\x{e5}" ] }; If I use JSON:Syck Dump/Load it works just excellent: Original perl hashref: $VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ 'ÆæØøÅå' ], 'c1' => [ 'Æ', 'æ', 'Ø', 'ø', 'Å', 'å' ] }; json text:{"desc":"norwegian characters:","c2":["ÆæØøÅå"],"c1":["Æ","æ","Ø","ø","Å","å"]} back to perl hashref:$VAR1 = { 'desc' => 'norwegian characters:', 'c2' => [ 'ÆæØøÅå' ], 'c1' => [ 'Æ', 'æ', 'Ø', 'ø', 'Å', 'å' ] }; -- MortenB
Show quoted text
>I see you use ascii 7bit, norwegian characters are within >the 8bit ascii range. > >But the conversion without parameters (latin?) results in >escaped utf8 for data dumper even if I use 'no utf8;', >this was what I first tried, I assumed that was ascii=>1
If you want to know the how to use, again, please see to http://search.cpan.org/~mlehmann/JSON-XS-2.21/XS.pm#ENCODING/CODESET_FLAG_NOTES and the other JSON/JSON::XS doc sections. Show quoted text
>If I use JSON:Syck Dump/Load it works just excellent:
If you are satisfied with JSON::Syck, I have nothing to say. Regards,
Closed.