On Thu Sep 26 21:45:02 2013, DAGOLDEN wrote:
Show quoted text> On Sat Mar 31 05:20:21 2012, antox@ml.lv wrote:
> > YAML-Tiny-1.51 attempts utf8::decode in read_string, but does not
> > utf8::encode in write_string. Is there a reason for such asymmetric
> > behaviour and serializing not to byte stream?
>
> I believe the decoding is an attempt to "do the right thing" for
> people who pass a string read raw from a file who forgot to decode it.
>
> I don't believe that write_string should encode -- at least not by
> default -- as tools that expect characters and then do their own UTF-8
> encoding would double-encode.
>
> The "right" thing would be for read_string not to decode by default,
> but I'm not sure changing it now is wise.
As far as I can understand the spec [1], the stream is supposed to be
encoded using UTF-8/16. So information appears either as octets or as a
completely unfolded data structure (no intermediate representations).
This seems like what YAML::XS does. Apparently, when dealing with streams
presented as strings, YAML and YAML::Syck implicitly delegate encoding
issues to the outside. If I remember correctly, some JSON module(s) also
had deviations in that matter.
Show quoted text> Possibly adding a write_utf8_string function that gives encoded output
> makes sense so the choice is more obvious to people.
I suspect I was mostly concerned about behavior of Load/Dump (interface
common to all YAML* modules). Dump just yields whatever write_string
returns. But also the write method (called by DumpFile) prints output
of write_string to file as is - this produces "wide char" warning
(unless user has set I/O layer in outer scope or tricked :layer into the
passed file path). (For comparison, YAML::(Load|Dump)File force :utf8
layer, but YAML::Syck::(Load|Dump)File accept $fh for path). I think
this was why I leaned towards altering write_string and moving closer to
YAML::XS. If my suggestion appears likely to cause more problems than to
solve, then IMHO putting a few explanatory lines into POD would be more
beneficious than adding write_utf8_string.
[1]
http://www.yaml.org/spec/1.2/spec.html#id2771184