Subject: | Best practice: Generate JSON that's valid JS. |
Hi,
Did you know JSON is not a subset of JS?
According to the ECMAScript specs[1,2], string literals can't contain the delimiter or line terminators (U+000A, U+000D, U+2028 and U+2029).
According to the JSON spec[3], strings literals[4] can contain any Unicode character except the delimiter or a control character.
U+000A and U+000B are control characters, but U+2028 and U+2029 are not.
$ perl -E'
say sprintf "U+%04X %s", $_, chr($_) =~ /\pC/ ? 1 :0
for 0x000A, 0x000D, 0x2028, 0x2029
'
U+000A 1
U+000D 1
U+2028 0
U+2029 0
That means JSON is not a subset of JS! This will come as a surprise to many.
The best practices are practices that avoid surprise, so it better JSON encoders should always escape U+2028 and U+2029 (like you already do for U+0008 and other control characters). This will allows people to continue using the common practice of
var data = [% data | json %];
when they should be doing
var data = JSON.parse([% data | json | jslit %]);
I propose the following patch to JSON::XS:
-$arg =~ s/([\x00-\x08\x0b\x0e-\x1f])/'\\u00' . unpack('H2', $1)/eg;
+$arg =~ s/([\x00-\x08\x0b\x0e-\x1f\x2028\x2029])/sprintf('\\u%04x', ord($1))/eg;
Thanks,
Eric
References:
1. http://bclary.com/2004/11/07/#a-7.8.4
2. http://bclary.com/2004/11/07/#a-7.3
3. http://json.org/
4. https://tools.ietf.org/html/rfc4627#section-2.5