Skip Menu |

This queue is for tickets about the YAML-Syck CPAN distribution.

Report information
The Basics
Id: 25436
Status: resolved
Priority: 0/
Queue: YAML-Syck

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.82
Fixed in: (no value)



Subject: Warning when using wide characters
The following script: #!/usr/bin/perl use strict; use warnings; use YAML::Syck qw(); $YAML::Syck::ImplicitUnicode = 1; YAML::Syck::DumpFile("/tmp/yaml.yml", "\x{20ac}"); would cause the warning: Wide character in print at /usr/perl5.8.7/lib/site_perl/5.8.7/i686-linux/YAML/Syck.pm line 51. I guess that something like binmode(":utf8") in DumpFile would fix the warning. Regards, Slaven
On Wed Mar 14 13:00:48 2007, SREZIC wrote: Show quoted text
> The following script: > > #!/usr/bin/perl > use strict; > use warnings; > use YAML::Syck qw(); > $YAML::Syck::ImplicitUnicode = 1; > YAML::Syck::DumpFile("/tmp/yaml.yml", "\x{20ac}"); > > would cause the warning: > > Wide character in print at > /usr/perl5.8.7/lib/site_perl/5.8.7/i686-linux/YAML/Syck.pm line 51. > > I guess that something like binmode(":utf8") in DumpFile would fix the > warning. >
The issue is still in 1.05. Thinking again about it, it seems that the binmode call must not be unconditional, but only used if $ImplicitUnicode is set. Also it is not clear what to do in the case if DumpFile operates on an open filehandle. Let the user set binmode on the filehandle? Push the utf8 layer before writing/reading and pop it after? Regards, Slaven
I don't know enough about the YAML spec to know if non-utf8 wide chars are supported. An easy solution that would not round trip well would be this patch
Subject: patch1.txt
diff --git a/lib/YAML/Syck.pm b/lib/YAML/Syck.pm index 1353866..8badaac 100644 --- a/lib/YAML/Syck.pm +++ b/lib/YAML/Syck.pm @@ -96,21 +96,22 @@ sub _is_openhandle { sub DumpFile { my $file = shift; + require Encode; if ( _is_openhandle($file) ) { if ($#_) { - print {$file} YAML::Syck::DumpYAML($_) for @_; + print {$file} Encode::encode_utf8(YAML::Syck::DumpYAML($_)) for @_; } else { - print {$file} YAML::Syck::DumpYAML($_[0]); + print {$file} Encode::encode_utf8(YAML::Syck::DumpYAML($_[0])); } } else { open(my $fh, '>', $file) or die "Cannot write to $file: $!"; if ($#_) { - print $fh YAML::Syck::DumpYAML($_) for @_; + print $fh Encode::encode_utf8(YAML::Syck::DumpYAML($_)) for @0_; } else { - print $fh YAML::Syck::DumpYAML($_[0]); + print $fh Encode::encode_utf8(YAML::Syck::DumpYAML($_[0])); } close $fh; }
I spoke with Avar about this. The plan is to update the documentation to clarify that if you are expected to open the file handle as UTF8 if you expect wide chars to be in the structure: open(my $fh, ">:encoding(UTF-8)", "out.yml") or die DumpFile($fh, $hashref);
On 2010-07-20 00:11:55, TODDR wrote: Show quoted text
> I spoke with Avar about this. The plan is to update the documentation > to clarify that if you are > expected to open the file handle as UTF8 if you expect wide chars to > be in the structure: > > open(my $fh, ">:encoding(UTF-8)", "out.yml") or die > DumpFile($fh, $hashref); >
Sorry, I have to re-open this ticket. Using this is not enough to get a dump/load roundtrip working (see below). Also, I don't like it that the user has to do something special to have wide character serialization correct. I think there should be a way to detect the presence of wide characters automatically and do the right thing? Regards, Slaven #!/usr/bin/perl -w use strict; use Test::More 'no_plan'; use YAML::Syck qw(DumpFile LoadFile); my $test = ["\x{20ac}"]; open(my $fh, ">:encoding(UTF-8)", "/tmp/test.yml"); DumpFile $fh, $test; close $fh or die $!; my $test2 = LoadFile "/tmp/test.yml"; is_deeply($test2,$test); __END__ $ perl5.12.0 /tmp/yamlsyck.pl not ok 1 # Failed test at /tmp/yamlsyck.pl line 12. Wide character in print at /usr/perl5.12.0/lib/5.12.0/Test/Builder.pm line 1753. # Structures begin differing at: # $got->[0] = 'âÃÂì' # $expected->[0] = 'â¬' 1..1 # Looks like you failed 1 test of 1. Exitcode 1
Show quoted text
> Also, I don't like it that the user has to do something special to have > wide character serialization correct. I think there should be a way to > detect the presence of wide characters automatically and do the right thing?
As an english speaker, my wide character ignorance is vast. I'm open to suggestions but the little I know is that auto-detection algorithms for UTF8 are buggy at best. What do you suggest?
On 2010-08-30 13:13:12, TODDR wrote: Show quoted text
> > Also, I don't like it that the user has to do something special to
> have
> > wide character serialization correct. I think there should be a way
> to
> > detect the presence of wide characters automatically and do the
> right thing? > > As an english speaker, my wide character ignorance is vast. I'm open > to suggestions but the little > I know is that auto-detection algorithms for UTF8 are buggy at best. > > What do you suggest?
I had a very brief look into the source code of YAML::Syck. Probably the root problem is the usage of SvPV and newSVpvn in perl_syck.h. It should rather use SvPV_utf8 and newSVpvn_utf8. I think in this case all the hacks with ImplicitUnicode and suggesting an encoding layer when doing IO may be removed. Regards, Slaven
Ticket migrated to github as https://github.com/toddr/YAML-Syck/issues/28