Subject: | utf8 flag wrong |
Date: | Tue, 18 Jun 2013 22:24:57 +0200 |
To: | bug-JSON [...] rt.cpan.org |
From: | Adolf Szabo <adolf.szabo [...] gmail.com> |
Hi,
My problem is that JSON->new()->decode($str) always sets utf8 flag to ON
for each string in the hash, no matter what I specify (ascii, latin1,
utf8(0) or utf8(1). This is not only an annoyance, but I think a bug too.
Let me give you an example:
Here is a sample json file, with $h->{TITL} containing őa as string. We
will focus on the second character, the ascii 'a' for now:
aszabo@mepc:/tmp$ hexdump -C test.txt
00000000 7b 22 54 49 54 4c 22 3a 22 c5 91 61 22 7d 0a
|{"TITL":"..a"}.|
0000000f
aszabo@mepc:/tmp$ cat a.pl
use strict;
use warnings;
use Encode;
use JSON;
local $/=undef;
my $str=<STDIN>;
my $h=JSON->new()->utf8(1)->decode($str);
#my $h=JSON->new()->utf8(0)->decode($str);
my $c=substr($h->{TITL},1,1);
printf("%s [%d], utf8 flag is
%s\n",$c,ord($c),Encode::is_utf8($c)?'ON':'OFF');
exit;
aszabo@mepc:/tmp$ cat test.txt | perl a.pl
a [97], utf8 flag is ON
This is as expected so far. Now I enable utf8(0) line, and repeat:
aszabo@mepc:/tmp$ cat test.txt | perl a.pl
� [145], utf8 flag is ON
This is wrong: utf8 flag is set to ON, however $h->{TITL} is not in perl's
internal encoding format as second character should return 'a', not second
byte of first character. This utf8 flag is a problem later on when I use
regexp on the strings of the hash etc.
Please let me know what you think.
Thx, Adolf