Skip Menu |

This queue is for tickets about the JSON-PP CPAN distribution.

Report information
The Basics
Id: 115927
Status: rejected
Priority: 0/
Queue: JSON-PP

People
Owner: Nobody in particular
Requestors: dolmen [...] cpan.org
Cc: ether [...] cpan.org
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.27400
Fixed in: (no value)



Subject: Reject U+D800 to U+DFFF in strings for encoding

JSON::PP allows to create invalid JSON if a string contains codepoint that are forbidden in Unicode because they are reserved for UTF-16 surrogate pairs: U+D800 to U+DFFF


$ perl -MJSON::PP= -E '$J=JSON::PP->new->ascii->allow_nonref; say $J->decode($J->encode("\N{U+D800}"))'
missing low surrogate character in surrogate pair, at character offset 8 (before "(end of string)") at -e line 1.

-- 
Olivier Mengué - http://perlresume.org/DOLMEN
For comparison, the Go language rejects completely U+D800 when converting an array of byte "\xed\xa0\x80" to a string using UTF-8 (it replaces each *byte* with U+FFFD) and the JSON decoder decodes "\uD800" as U+FFFD.

https://play.golang.org/p/2aCCCTluVZ



package main

import (
    "fmt"
    "encoding/json"
)

func main() {
    fmt.Print("Decoding \"\\uD800\": ")
    var s string
    json.Unmarshal([]byte(`"\uD800"`), &s)
    for _, c := range s {
        fmt.Printf("U+%X ", c)
    }
    fmt.Print("\nDecoding \\xed\\xa0\\x80: ")
    json.Unmarshal([]byte{'"', 0xed, 0xa0, 0x80, '"'}, &s)
    for _, c := range s {
        fmt.Printf("U+%X ", c)
    }
    fmt.Println()

    fmt.Print("Converting bytes to UTF-8: ")    
    invalidSurrogate := []byte{0xed, 0xa0, 0x80}
    s = string(invalidSurrogate)
    for _, c := range s {
        fmt.Printf("U+%X ", c)
    }
    fmt.Println()
}





-- 
Olivier Mengué - http://perlresume.org/DOLMEN
On Thu Jul 07 01:52:09 2016, DOLMEN wrote: Show quoted text
> JSON::PP allows to create invalid JSON if a string contains codepoint > that are > forbidden in Unicode because they are reserved for UTF-16 surrogate > pairs: > U+D800 to U+DFFF > > > $ perl -MJSON::PP= -E '$J=JSON::PP->new->ascii->allow_nonref; say > $J->decode($J->encode("\N{U+D800}"))' > missing low surrogate character in surrogate pair, at character offset > 8 > (before "(end of string)") at -e line 1. > > -- > Olivier Mengué - http://perlresume.org/DOLMEN
I understand your point, but 1) "JSON::PP allows to create invalid JSON" is completely wrong (remove $J->decode, then you'll see no error), 2) and passing a JSON that contains non-ascii to ->ascii decoder is your fault. JSON::PP correctly spits the error message. Marked as rejected. Thanks.