Skip Menu |

This queue is for tickets about the JSON-PP CPAN distribution.

Report information
The Basics
Id: 75755
Status: open
Priority: 0/
Queue: JSON-PP

People
Owner: Nobody in particular
Requestors: jstevenson [...] bepress.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Unicode line and paragraph separators (\u2028 and \u2029) are not correctly JavaScript escaped
Date: Tue, 13 Mar 2012 16:48:22 -0700
To: bug-JSON [...] rt.cpan.org
From: Joel Stevenson <jstevenson [...] bepress.com>
Version: JSON-2.53 Perl: 5.10.1 Platform: Linux jstevenson1.bepress.com 2.6.26-2-xen-amd64 #1 SMP Wed Jan 13 00:12:41 UTC 2010 x86_64 GNU/Linux JSON::PP is passing these characters through to the final JSON string but should be converting them to \u escapes. perl -MJSON::PP -E 'say JSON::PP::encode_json( [ "line sep here>\x{2028}<" ] )' ["line sep here>
<"] Should be: ["line sep here>\u2028<"] Looks like the ECMA 262 spec says (section 7.8.4) that line separator and paragraph separator characters may not appear un-escaped in a string literal. IE doesn't seem to care but FF and Chrome do and will throw an "unterminated string literal" error upon encountering them in a string literal.
JSON::PP (and JSON::XS) follows rfc4627. So it does not escape \u2028 or \u2029 basically. If you want \u escaped characters, please try JSON::PP->new->ascii->encode(["\x{2028}"]); Regards,
Subject: Re: [rt.cpan.org #75755] Unicode line and paragraph separators (\u2028 and \u2029) are not correctly JavaScript escaped
Date: Wed, 14 Mar 2012 09:15:59 -0700
To: bug-JSON [...] rt.cpan.org
From: Joel Stevenson <jstevenson [...] bepress.com>
Hi, thanks for the response. Yes, that's what I'm looking for. I believe that should be the default behavior though, given the wording of the ECMA specification. Since these characters are invalid within string literals it seems that the most appropriate thing for the JSON conversion to properly \u encode them when converting perl string(ish) scalars to the JSON string literals. If this is not done in the module then all uses of the module that may include data from an external source (database, message server, etc.) must manually request this each time. "7.8.4 String Literals A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by an escape sequence. All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed. Any character may appear in the form of an escape sequence." Best, Joel On Tue, Mar 13, 2012 at 7:45 PM, Makamaka Hannyaharamitu via RT <bug-JSON@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=75755 > > > JSON::PP (and JSON::XS) follows rfc4627. > So it does not escape \u2028 or \u2029 basically. > If you want \u escaped characters, please try > >  JSON::PP->new->ascii->encode(["\x{2028}"]); > > > Regards, >
On Tue Mar 13 22:45:37 2012, MAKAMAKA wrote: Show quoted text
> JSON::PP (and JSON::XS) follows rfc4627. > So it does not escape \u2028 or \u2029 basically.
RFC 4627 actually contradicts itself. It presents a syntax that is not a subset of JavaScript. But later on it says that JSON *is* a subset of JavaScript. So either the syntax presented is wrong, or the later statement is wrong. So the RFC can be understood two ways. If you make JSON::PP always escape U+2028 and U+2029, it will be valid according to both interpretations of the ambiguous RFC.
I agree with SPROUT. Currenty, if your data contains one of those characters, JSON.pm will produce malformed and unusable output according to the major web browsers' parsers.