Skip Menu |

This queue is for tickets about the YAML-Syck CPAN distribution.

Report information
The Basics
Id: 20830
Status: resolved
Priority: 0/
Queue: YAML-Syck

People
Owner: cpan [...] audreyt.org
Requestors: MLEHMANN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: JSON::Syck cannot handle utf-8/unicode correctly.
JSON is defined to be an octet stream (by default encoded in UTF-8), while strings within JSON are defined to be unicode. There seems to be no way to handle this correctly with JSON::Syck, which makes it impossible to properly exchange data using non-ascii characters. The module offers $JSON::Syck::ImplicitUnicode, but neither 0 or 1 result in correct behaviour. If it is 0, then JSON::Syck does not corretcly encode perl strings into json objects. If it is 1, it sometimes returns json objects with "bytes" Show quoted text
>255.
The problem is likely the wrong mental model. E.g. the documentation incorrectly says: Perl (UTF-8 bytes) => JSON (Unicode flagged) Perl has no such thing as a "unicode flag". Perl has a utf-8 flag, but that doesn't flag a scalar as unicode, its jsut a different encoding of the scalar data. Perl only knows about binary strings (octet strings) and text strings (unicode character strings), both of which can have the UTF-8 flag set or cleared. The correct handling is to always encode the resulting json object correctly (preferably in UTF-8), and always create perl text strings (in either UTF-8 or latin1 encoding), as json strings are defined to be text.
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Thu, 3 Aug 2006 22:24:43 +0900
To: bug-JSON-Syck [...] rt.cpan.org
From: "Tatsuhiko Miyagawa" <miyagawa [...] gmail.com>
To make long reply short: Looks like you want to use JSON::Syck::Dump() to convert Perl (Unicode string) => JSON (UTF-8 bytes), right? If so, use Encode; $JSON::Syck::ImlicitUnicode = 1; $json = Encode::encode_utf8( JSON::Syck::Dump( $unicode_string ) ); you'll get the utf-8 JSON object. If you like \uXXXX style escaped JSON data, try Encode::JavaScript::UCS as well. Longer reply starts here :) On 8/3/06, Marc_Lehmann via RT <bug-JSON-Syck@rt.cpan.org> wrote: Show quoted text
> > There seems to be no way to handle this correctly with JSON::Syck, which > makes it impossible to properly exchange data using non-ascii characters.
We use JSON::Syck to transfer non-ascii (Japanese) characters without any issues. Show quoted text
> The module offers $JSON::Syck::ImplicitUnicode, but neither 0 or 1 > result in correct behaviour. > If it is 0, then JSON::Syck does not corretcly encode perl strings into > json objects. If it is 1, it sometimes returns json objects with "bytes"
> >255.
This looks confusing. If it is 1, Dump()ed json objects are always UTF-8 flagged, which could obviously be > 255 (since Unicode characters could be). Show quoted text
> The problem is likely the wrong mental model. E.g. the documentation > incorrectly says: > > > Perl (UTF-8 bytes) => JSON (Unicode flagged) > > Perl has no such thing as a "unicode flag". Perl has a utf-8 flag, but > that doesn't flag a scalar as unicode,
by "Unicode flagged" I mean UTF8 flag in Perl 5. It's true that the official term is UTF-8 flag but to me it's totally equivalent to say. Show quoted text
> its jsut a different encoding of > the scalar data. Perl only knows about binary strings (octet strings) > and text strings (unicode character strings), both of which can have the > UTF-8 flag set or cleared.
You're right, but utf-8 flagged strings are treated as "Unicode string" since it could be encode()d to any encodings and ord($str) returns Unicode codepoint correctly, etc. Show quoted text
> The correct handling is to always encode the resulting json object > correctly (preferably in UTF-8), and always create perl text strings (in > either UTF-8 or latin1 encoding), as json strings are defined to be text.
If you really think JSON::Syck is doing somthing wrong (which we don't hope), please file a failing test case. -- Tatsuhiko Miyagawa
CC: MLEHMANN [...] cpan.org
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Thu, 3 Aug 2006 16:36:57 +0200
To: "miyagawa [...] gmail.com via RT" <bug-JSON-Syck [...] rt.cpan.org>
From: Marc Lehmann <schmorp [...] schmorp.de>
On Thu, Aug 03, 2006 at 09:25:00AM -0400, "miyagawa@gmail.com via RT" <bug-JSON-Syck@rt.cpan.org> wrote: Show quoted text
> > If it is 0, then JSON::Syck does not corretcly encode perl strings into > > json objects. If it is 1, it sometimes returns json objects with "bytes"
> > >255.
> > This looks confusing. If it is 1, Dump()ed json objects are always > UTF-8 flagged, which could obviously be > 255 (since Unicode > characters could be).
No, they are not. As soon as you ste the utf-8 _flag_ on the scalar, it no longer is utf-8, it is now text consisting of unicode characters. Encode in your example above makes utf-8 out of it. Thats the "wrong mental model" I wrote about in my original mail. You wrognly assume that setting the _internal_ UTF-8 bit makes a scalar utf-8. This is logically wrong. Clearing the bit on a scalar that is encoded in utf-8 internally makes it valid UTF-8. Show quoted text
> > Perl (UTF-8 bytes) => JSON (Unicode flagged) > > > > Perl has no such thing as a "unicode flag". Perl has a utf-8 flag, but > > that doesn't flag a scalar as unicode,
> > by "Unicode flagged" I mean UTF8 flag in Perl 5. It's true that the > official term is UTF-8 flag but to me it's totally equivalent to say.
Which is the problem. Perl doesn't work that way. Let me explain it differently: Perl can handle binary octet strings and unicode character strings. The difference is that an octet string contains no character values > 255, while unicode character strings can not. The difference is in the way you treat the scalar - perl does not make a difference. You can have an octet string encoded as utf-8 internally, or as latin1/bytes. Regardless of this _internal_ encoding, a byte string will always be a byte string. Likewise, you can have a unicode string encoded as utf-8 _internally_, but also as latin1, _iff_ the string contains only characters < 256. The model you assume is that the utf-8 flag that you can set/clear on scalars somehow makes a string unicode or not. This is a broken assumption. See for example the utf8 manpage and the utf8::encode and utf8::decode function. You will see that utf8::encode _clears_ the utf-8 bit. Clearing the utf-8 bit makes a scalar utf-8 (when it actually contains utf-8). utf8::encod takes a character string and converts it into utf-8. Likewise, utf8::decode might or might not set the utf-8 _flag_ on the scalar. It nevertheless comverts an utf-8 octet string into a unicode character string. It will be unicode regardless of wether the resulting string has the utf-8 bit set or not, the utf-8 bit has nothing to do with unicode-ness. If you deviate from this model into the brokennotion that utf-8-bit == unicode flag then you are bound to run into problems. For example, when I feed JSON::Syck a valid json object/string, encoded in utf-8 octets, I get a datastructure with utf-8 in them, not perl strings. JSON, however, describes unicode strings, which perl can handle. Worse, the outcome depends on wether the octet string is internally encoded in UTF-8 or not: JSON::Syck will give different results in this case, although the input string is identical. The only corretc solution is to treat the utf-8 bit in perl as just a way of representing integer character indices in strings: If it is cleared, the string only stores <256 character indices, if it is set, it might store >255 indices. It has *nothing* whatsoever to do with unicode. JSON, on the other hand, clearly defines that a json object/string is *encoded* in unicode, i.e. an octet string (all unicode encodings deliver an octet string), and that the structure it represents is unicode strings. JSON::Syck breaks this by creating unencoded json objects (which is not defined by rfc4627 when serialising, or by not corretcly decoding strings stored in a json object to perl strings. As an explizit example of what goes wrong, look at this: my $hash = JSON::Syck::Load "some-octet-string-containing-utf-8-encoded-json-object"; The $hash will now contain utf-8 encoded octet strings (all indices < 256), NOT the strings that are actually stored in the json object/string. Likewise: print STREAM JSON::Syck::Dump $hash; Will elicit a warning when STREAM is in binmode, because it might contain indices > 255 which are not valid. JSON objects/strings, on the other hand, are always encoded in some unicode encoding, and thus never can have indices Show quoted text
>255.
Obviously, the above examples depend on sepcific settings of ImplicitUnicode. However, no setting of ImplciitUnicode work correctly. Either you get broken JSON objects (indices >255), or your decoded strings are broken (utf-8, not perl text strings). This is not helped by the fact that the documentation does not specify at all what JSON::Syck does, as it talsk about a unicode flag that perl does not have. If the unicode flag is the utf-8 flag, it is simply broken. I hope this was clearer then my initial mail. If something is unclear still, do not hesitate to ask for clarification. Encoding issues are not easy, and I _really_ want the JSON::Syck module to work correctly, as soon as possible, before too amny people have to work around its encoding bugs (as I have to do). Show quoted text
> You're right, but utf-8 flagged strings are treated as "Unicode > string"
Not by perl, which is the problem. JSON::Syck indeed treats it incorrectly as "unicode string", but as this clashes with the pelr programming language, this results in bugs. Show quoted text
> > The correct handling is to always encode the resulting json object > > correctly (preferably in UTF-8), and always create perl text strings (in > > either UTF-8 or latin1 encoding), as json strings are defined to be text.
> > If you really think JSON::Syck is doing somthing wrong (which we don't > hope), please file a failing test case.
echo '{"a":"ü"}' | perl -MJSON::Syck -e 'binmode STDIN; $hash = JSON::Syck::Load <>' (All in UTF-8). This results not in a "ü" character, but in two characters, \xc3\xbc. RFC4627, however, states (section 3) that the above json-object is encoded in utf-8 (because there are no 0 bytes in the initial 4 bytes). JSON::Syck, however, incorrectly interprets it as latin1, which is not even mentioned as a valid encoding in rfc4627, and is incapable of transfering characters >255. Similarly, when dumping the perl hash '{ a=> "ü" }', we do not get a correctly encoded json string, but instead a perl string with characters Show quoted text
>255, i.e. unencoded, which again clashes with section 3 of rfc4627.
Changing ImplciitUnicode changes the test cases and the outcomes, but doesn't fix the problem, as no steting of ImplicitUnicode correctly decoded json objects encoded in utf-8 and correctly encoded json objects in utf-8. Nor any other unicode encoding. -- The choice of a -----==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ ____ __ pcg@goof.com --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Fri, 4 Aug 2006 00:04:53 +0900
To: bug-JSON-Syck [...] rt.cpan.org
From: "Tatsuhiko Miyagawa" <miyagawa [...] gmail.com>
On 8/3/06, Marc Lehmann via RT <bug-JSON-Syck@rt.cpan.org> wrote: Show quoted text
> > > > This looks confusing. If it is 1, Dump()ed json objects are always > > UTF-8 flagged, which could obviously be > 255 (since Unicode > > characters could be).
> > No, they are not. As soon as you ste the utf-8 _flag_ on the scalar, it no > longer is utf-8, it is now text consisting of unicode characters.
Yes, I know. Thanks for your explanation. But please, please don't spend time telling me how Perl Unicode/UTF-8 works. I'm one of the AUTHORs of Encode.pm and other CPAN Encode:: modules, and am pretty sure I understand the difference of encode_utf8, decode_utf8, utf8::encode, utf8::decode, _utf8_on, _utf8_off and whatever Perl encoding magics. It'd be needless to say about how Audrey is. So let's break down to the actual example: Show quoted text
> echo '{"a":"ü"}' | perl -MJSON::Syck -e 'binmode STDIN; $hash = JSON::Syck::Load <>' > > (All in UTF-8). This results not in a "ü" character, but in two > characters, \xc3\xbc.
Wrong. \xc3\xbc is an UTF-8 bytes representation for "ü" (U+00FC). http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=00fc decode_utf8($hash->{a}) and you'll get the Unicode string U+00FC. Show quoted text
> RFC4627, however, states (section 3) that the > above json-object is encoded in utf-8 (because there are no 0 bytes in > the initial 4 bytes). JSON::Syck, however, incorrectly interprets it as > latin1, which is not even mentioned as a valid encoding in rfc4627, and is > incapable of transfering characters >255.
By default ($ImplicitUnicode = 0), JSON::Syck doesn't care if the encoding is latin-1, utf-8, euc-jp or whatever. If JSON is UTF-8 bytes, result Perl is UTF-8 bytes as well (see above again). with ImplicitUnicode, Show quoted text
> echo '{"a":"ü"}' | perl -MJSON::Syck -e '$JSON::Syck::ImplicitUnicode = 1; binmode STDIN; $hash = JSON::Syck::Load <>;'
and you'll get $hash->{a} = "\x{fc}", the utf8 flagged string of U+00FC. Show quoted text
> Similarly, when dumping the perl hash '{ a=> "ü" }', we do not get a > correctly encoded json string, but instead a perl string with characters
> >255, i.e. unencoded, which again clashes with section 3 of rfc4627.
No. $hash = { a => 'ü' }; print JSON::Syck::Dump($hash); prints {"a":"ü"} as utf-8 bytes. With ImplicitUnicode set 1, you'll emit Unicode strings, so you have to binmode before printing to filehandle, as in: binmode STDOUT, ":utf8"; print JSON::Syck::Dump($hash)->{a}; I'm thinking that there would be some terminlogy miscommunication based on (probably poor) documentation of JSON::Syck. I'll work on an improvement but doc patch is also welcome. -- Tatsuhiko Miyagawa
CC: MLEHMANN [...] cpan.org
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Thu, 3 Aug 2006 21:26:12 +0200
To: "miyagawa [...] gmail.com via RT" <bug-JSON-Syck [...] rt.cpan.org>
From: Marc Lehmann <schmorp [...] schmorp.de>
On Thu, Aug 03, 2006 at 11:05:12AM -0400, "miyagawa@gmail.com via RT" <bug-JSON-Syck@rt.cpan.org> wrote: Show quoted text
> > No, they are not. As soon as you ste the utf-8 _flag_ on the scalar, it no > > longer is utf-8, it is now text consisting of unicode characters.
> > Yes, I know.
Good :) Show quoted text
> Thanks for your explanation. But please, please don't spend time > telling me how Perl Unicode/UTF-8 works. I'm one of the AUTHORs of
OK, then all that needs to be done is to apply this to JSON::Syck so it works, and the documentation to be fixed. Regarding the "mine is longer than yours", I also reported and fixed many bugs in the perl core and modules regarding unicode handling. I don't give anything on arguments such as this, I'd much rather let facts speak. And in this case, if you are so knowledgable, then I don't see the problem in fixing JSON::Syck to a) generate correct perl data structures when Load'ing and b) generate correct JSON objects un Dump'ing, without having to resort to additional modules, hackery, and having to set magic global variables on every dump/load. It's realyl a disservice for the user to force them to apply extra work when they just want a rfc-compliant serialiser/deserialiser. Show quoted text
> So let's break down to the actual example: >
> > echo '{"a":"ü"}' | perl -MJSON::Syck -e 'binmode STDIN; $hash = JSON::Syck::Load <>' > > > > (All in UTF-8). This results not in a "ü" character, but in two > > characters, \xc3\xbc.
> > Wrong. \xc3\xbc is an UTF-8 bytes representation for "ü" (U+00FC). > http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=00fc
That doesn't matter. the json object encodes U+00FC, which in perl is represented as chr(0xfc). This is the whole point. Show quoted text
> decode_utf8($hash->{a}) and you'll get the Unicode string U+00FC.
Uhm... whats the point of JSON::Syck if it doesn't decode json to perl datastructures but stops halfway at decoding strings? Show quoted text
> > latin1, which is not even mentioned as a valid encoding in rfc4627, and is > > incapable of transfering characters >255.
> > By default ($ImplicitUnicode = 0), JSON::Syck doesn't care if the > encoding is latin-1, utf-8, euc-jp or whatever. If JSON is UTF-8 > bytes, result Perl is UTF-8 bytes as well (see above again).
Indeed, and thats the bug, as perl doesn't represent unicode as utf-8 on the perl level, but as unicode character indices. Show quoted text
> > echo '{"a":"ü"}' | perl -MJSON::Syck -e '$JSON::Syck::ImplicitUnicode = 1; binmode STDIN; $hash = JSON::Syck::Load <>;'
> > and you'll get $hash->{a} = "\x{fc}", the utf8 flagged string of U+00FC.
Right, but with ImplciitUnicode set to one, JSON::Syck::Dump no longer works, as it suddenly doesn't output a valid json string (which needs to be encoded). Thats what I meant: neither setting results in correct behaviour. Show quoted text
> > Similarly, when dumping the perl hash '{ a=> "ü" }', we do not get a > > correctly encoded json string, but instead a perl string with characters
> > >255, i.e. unencoded, which again clashes with section 3 of rfc4627.
> > No. > > $hash = { a => 'ü' }; > print JSON::Syck::Dump($hash); > > prints {"a":"ü"} as utf-8 bytes.
Not with ImplicitUnicode set to 1. Show quoted text
> With ImplicitUnicode set 1, you'll emit Unicode strings, so you have > to binmode before printing to filehandle, as in: > > binmode STDOUT, ":utf8"; > print JSON::Syck::Dump($hash)->{a};
If Dump would output a correctly encoded JSON string, then using utf8 as encoding would be a bug. Show quoted text
> I'm thinking that there would be some terminlogy miscommunication > based on (probably poor) documentation of JSON::Syck. I'll work on an > improvement but doc patch is also welcome.
The documentation is certainly confusing, as it doesn't explain what ImplicitUnicode does. However, do you really think that having to set implicitunicode to one value for dumping and to another for loading to get correct behaviour is desirable? Right now, they are not even symmetrical, so dumping some datastructure and loading it again will corrupt the data in it. Do you really think having asymetric operations is the way to proceed? If yes, then your understanding of unicode issues in general and unicode issues in perl isn't up to the task. It would really be nice if perl had a good and working JSON module that "just works". -- The choice of a -----==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ ____ __ pcg@goof.com --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Fri, 4 Aug 2006 05:16:51 +0900
To: bug-JSON-Syck [...] rt.cpan.org
From: "Tatsuhiko Miyagawa" <miyagawa [...] gmail.com>
On 8/4/06, Marc Lehmann via RT <bug-JSON-Syck@rt.cpan.org> wrote: Show quoted text
> > decode_utf8($hash->{a}) and you'll get the Unicode string U+00FC.
> > Uhm... whats the point of JSON::Syck if it doesn't decode json to perl > datastructures but stops halfway at decoding strings?
I don't think it "stops halfway." JSON::Syck just works so simply as "if ImplicitUnicode is 0, don't care about encodings. If set to 1, treat the JSON encoding as utf-8 and do the right thing." The reason why we made ImplicitUnicode = 0 as default was (IIRC): 1. JSON.pm does it so 2. At the time we coded (in January 2006 I guess), JSON spec didn't clearly say that the encoding should be UTF-8. We did email conversation with _why (the author of libsyck) and Crockford (JSON spec author) about it and got a concensus on UTF-8, though. So my answer is, ImplicitUnicode = 1 is what you want. If that breaks Dump() that's a separate problem to fix. (More on that later) Show quoted text
> > > latin1, which is not even mentioned as a valid encoding in rfc4627, and is > > > incapable of transfering characters >255.
> > > > By default ($ImplicitUnicode = 0), JSON::Syck doesn't care if the > > encoding is latin-1, utf-8, euc-jp or whatever. If JSON is UTF-8 > > bytes, result Perl is UTF-8 bytes as well (see above again).
> > Indeed, and thats the bug, as perl doesn't represent unicode as utf-8 on > the perl level, but as unicode character indices.
It's not a bug. say, CGI.pm param() method, or Apache::Request or whatever, returns incoming parameter value as UTF-8 bytes (if the query parameter is encoded in UTF-8), not Unicode string. (with the default, ImplicitUnicode = 0) JSON::Syck returns the data structure in UTF-8 if input data is UTF-8, while it returns Uniode string if the input data is decoded as Unicode string in your app. That allows more flexibility to adopt JSON::Syck to your web app framework. Actually we use JSON::Syck on Catalyst (see my Catalyst::View::JSON and Catalyst::Plugin::JSONRPC) to serve blog writes from US to Japan to France to Taiwan on Vox.com. Audrey uses it to serve Taiwan bank intranet system by using JSON::Syck on Jifty. Show quoted text
> > > echo '{"a":"ü"}' | perl -MJSON::Syck -e '$JSON::Syck::ImplicitUnicode = 1; binmode STDIN; $hash = JSON::Syck::Load <>;'
> > > > and you'll get $hash->{a} = "\x{fc}", the utf8 flagged string of U+00FC.
> > Right, but with ImplciitUnicode set to one, JSON::Syck::Dump no longer > works, as it suddenly doesn't output a valid json string (which needs to > be encoded).
So okay. For now, as (i think clearly) documented in POD, there's no way to do: Perl (Unicode string) -> JSON (utf-8 bytes) and it's up to users. If you call it "a bug", I'd say "No it's not a bug, but patches and suggestions are welcome :)" Read the doc again. If "Unicode flagged" confuses you, replace that with "Unicode string." JSON (UTF-8 bytes) => Perl (Unicode flagged) JSON (Unicode flagged) => Perl (Unicode flagged) Perl (UTF-8 bytes) => JSON (Unicode flagged) Perl (Unicode flagged) => JSON (Unicode flagged) actually, the nicest thing that we emit JSON result as Unicode string (rather than utf-8 encoded) is that you can easily encode the result into a separate encoding, like 'JavaScript-UCS' (as in Encode::JavaScript::UCS) to get "a": "\uXXXX". Of course I know that you can use Encode::from_to($json, 'utf-8' => 'JavaScript-UCS') to do the same thing (if Dump() encodes the result in utf-8), though. Show quoted text
> > With ImplicitUnicode set 1, you'll emit Unicode strings, so you have > > to binmode before printing to filehandle, as in: > > > > binmode STDOUT, ":utf8"; > > print JSON::Syck::Dump($hash)->{a};
> > If Dump would output a correctly encoded JSON string, then using utf8 as > encoding would be a bug.
Sure. Since Dump outputs a JSON string as in Perl Unicode string, binmode()ing is what your app is supposed to do. Show quoted text
> The documentation is certainly confusing, as it doesn't explain what > ImplicitUnicode does.
Suggestions and improvements are welcome. Show quoted text
> However, do you really think that having to set implicitunicode to one value > for dumping and to another for loading to get correct behaviour is desirable? > Right now, they are not even symmetrical, so dumping some datastructure and > loading it again will corrupt the data in it.
No it doesn't, as seen in that the following test passes. use JSON::Syck; use Test::More 'no_plan'; for my $iu (0, 1) { $JSON::Syck::ImplicitUnicode = $iu; my $hash = { "a" => chr(0xfc) }; my $json = JSON::Syck::Dump($hash); is_deeply JSON::Syck::Load($json), $hash; } If you disagree, please give us a failing test suite. Show quoted text
> Do you really think having > asymetric operations is the way to proceed?
No i don't think so and it's not the current behavior as seen above. Show quoted text
> It would really be nice if perl had a good and working JSON module that > "just works".
I believe JSON::Syck just works with enough flexibility. So, I chatted briefly with Audrey and your request to do "Perl (Unicode string) <=> JSON (UTF-8 encoded)" seems to be a fair request. But for now, encoding to UTF-8 when emitting is supposed to be done on your app land, and we'll improve the doc to make it more explicit. I'm now going to fly for 3 days vacation this weekend and will be late replying from now on. I hope Audrey could be your assistant in behalf during my absense. Thank you! -- Tatsuhiko Miyagawa
RT-Send-CC: miyagawa [...] gmail.com, schmorp [...] schmorp.de
Marc, This ticket stalled in a JSON::Syck RT queue for 5 years when it should have been in YAML::Syck's queue. If you still feel this is an issue, please let me know by re-opening the ticket. Thanks, Todd
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Sun, 12 Feb 2012 09:31:24 +0100
To: Todd Rinaldo via RT <bug-YAML-Syck [...] rt.cpan.org>
From: Marc Lehmann <schmorp [...] schmorp.de>
On Sat, Feb 11, 2012 at 11:07:44PM -0500, Todd Rinaldo via RT <bug-YAML-Syck@rt.cpan.org> wrote: Show quoted text
> Marc, This ticket stalled in a JSON::Syck RT queue for 5 years when it should have been in > YAML::Syck's queue. If you still feel this is an issue, please let me know by re-opening the ticket.
Hmm, the problem is clearly in JSON::Syck, not YAML::Syck, and when you want to close valid bug reports without investigating them, you are free to do that, and I shouldn't reopen them. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\
RT-Send-CC: miyagawa [...] gmail.com, schmorp [...] schmorp.de
Show quoted text
> Hmm, the problem is clearly in JSON::Syck, not YAML::Syck, and when > you want to close valid bug reports without investigating them, you > are free to do that, and I shouldn't reopen them.
JSON::Syck development was merged with YAML::Syck ~5 years ago from what I can tell. When that happened, The RT queue for JSON::Syck disappeared from visibility on search.cpan/metacpan. I discovered and rescued these tickets, asking them to be ported into the YAML::Syck queue, which is where JSON::Syck tickets are reported these days. So when I say that I'm not trying to kill the ticket you can trust me. The last email I see is from Miyagawa, clarifying that this module does not do what you want and that it needs to be better documented. Since August 2006, you have not replied. Considering the amount of back and forth I see happend on this ticket, could I get some clarification on what you'd like to see next at this point? Do you have any patches as suggestions for this ticket? Thanks, Todd
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Thu, 16 Feb 2012 08:50:19 +0100
To: Todd Rinaldo via RT <bug-YAML-Syck [...] rt.cpan.org>
From: Marc Lehmann <schmorp [...] schmorp.de>
On Mon, Feb 13, 2012 at 02:02:15AM -0500, Todd Rinaldo via RT <bug-YAML-Syck@rt.cpan.org> wrote: Show quoted text
> Considering the amount of back and forth I see happend on this ticket, could I get some > clarification on what you'd like to see next at this point? Do you have any patches as > suggestions for this ticket?
I really don't care, both YAML::Syck and JSON::Syck are so broken w.r.t. unicode and so hopelessly far from both YAML and JSON that, I guess, everybody by now has been forced to alternatives that actually work. The report I made was to help JSON::Syck and/or YAML::Syck to become better, it's not a support request because I need a bugfix, so no change in either module will be of advantage to me. As I said, if there is no interest in fixing JSON::Syck to work properly, that's fine with me. Likewise, there is little interest for me in spending any more time on these obsolete modules. If there is interest in fixing things, then there is ample documentation already available. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\
Much of the discussion in this ticket is admittedly over my head, but I think the module at this point is beyond the point of being able to be fixed like Marc Suggests if nothing else, for compatibility reasons. I'm closing it for now.
Subject: Re: [rt.cpan.org #20830] JSON::Syck cannot handle utf-8/unicode correctly.
Date: Sun, 3 Mar 2013 00:27:02 +0100
To: Todd Rinaldo via RT <bug-YAML-Syck [...] rt.cpan.org>
From: Marc Lehmann <schmorp [...] schmorp.de>
On Sat, Mar 02, 2013 at 03:41:41AM -0500, Todd Rinaldo via RT <bug-YAML-Syck@rt.cpan.org> wrote: Show quoted text
> Much of the discussion in this ticket is admittedly over my head, but I think the module at this > point is beyond the point of being able to be fixed like Marc Suggests if nothing else, for > compatibility reasons. I'm closing it for now.
I can't follow this reasoning - I can understand that the behaviour should stay the same (by default), but closing the ticket and doing nothing is not really sensible or reasonable. There are lots of alternatives, in increasing order of effort required toi implement them: - being honest about the bug and at least leave it open, for other people to see (so they can understand how to work around it!). - fix the documentation, so people have a chance to work around it or deal with it without having to search for this bug report. - conditionally fix the behaviour, so old code relying on the bug will still work. Just closing the bug without at least fixing the documentation is, sorry to say, just showing extremely bad maintainersihp (I don't know who maintains it these days). It means willfully tricking users into bugs and wasting their time, without an actual need from the maintainer side (just keeping the bug open will help those users, not optimally, but sitll help - closing will just hide the bug and force users to do the research again). So while closing the bug by using an illogical explanation and doing nothing else(there clearly is no need to break existing programs to fix this bug) is a valid option, it's also a lousy option. Note I am just pointing this out - whoeever maintains this module has to do the choice fo doing a good job, or a bad job. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\
RT-Send-CC: schmorp [...] schmorp.de
I agree with Marc's reasoning. Since the current non-conformant behavior is documented as such, and we explicitly recommend people switching to JSON::XS in the documentation, I'm marking this ticket as "Stalled" and suggest that we let it remain in this state.
RT-Send-CC: miyagawa [...] gmail.com, schmorp [...] schmorp.de
On Sun Mar 03 12:48:31 2013, AUDREYT wrote: Show quoted text
> I agree with Marc's reasoning. Since the current non-conformant behavior > is documented as such, and we explicitly recommend people switching to > JSON::XS in the documentation, I'm marking this ticket as "Stalled" and > suggest that we let it remain in this state.
I Closed this ticket because: 1. Marc opened it 2. Had a discussion about the problem with Miyagawa about how it needed to be fixed. 3. The last comment was from Miyagawa was that he disagreed there was a bug, but that maybe some better documentation was in order. 4. Then this ticket was lost in an inaccessible RT queue since YAML::Syck not JSON::Syck is the primary module in this package. 5. When I rescued it and asked Marc for help, he informed me that he wanted to complain about the problem but had no interest in helping to fix it. Audrey, if you're interested in re-opening this, can you please provide me a documentation patch so we can close this out? This is the only outstanding item that you and Miyagawa agreed needed doing. Everything else was stated as a won't fix. The repo is here: https://github.com/toddr/YAML-Syck/blob/master/lib/JSON/Syck.pm If you prefer though, just paste me a patch in this ticket.
Ticket migrated to github as https://github.com/toddr/YAML-Syck/issues/33