Skip Menu |

This queue is for tickets about the YAML-Syck CPAN distribution.

Report information
The Basics
Id: 50227
Status: resolved
Priority: 0/
Queue: YAML-Syck

People
Owner: TODDR [...] cpan.org
Requestors: dmuey [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 1.07
Fixed in: 1.10_02



Subject: not quoting strings when data has been sorted numerically before being passed to JSON::Syck::Dump
Note how strings are treated, even with a string based sort: hal9000:~ dmuey$ perl -MJSON::Syck -e 'print JSON::Syck::Dump(\@ARGV)."\n";print JSON::Syck::Dump([sort { $a cmp $b } @ARGV])."\n";print JSON::Syck::Dump(\@ARGV)."\n";' 2 1 Howdy ["2","1","Howdy"] ["1","2","Howdy"] ["2","1","Howdy"] hal9000:~ dmuey$ Now compare that to when the data is passed through a numeric sort prior to JSON-ification: hal9000:~ dmuey$ perl -MJSON::Syck -e 'print JSON::Syck::Dump(\@ARGV)."\n";print JSON::Syck::Dump([sort { $a <=> $b } @ARGV])."\n";print JSON::Syck::Dump(\@ARGV)."\n";' 2 1 Howdy ["2","1","Howdy"] [Howdy,1,2] [2,1,Howdy] hal9000:~ dmuey$ JSON::Syck forever treats it as an integer and does not quote it! That results in broken JSON ... rebuilding YAML::Syck 1.07 with the attached patch applied creates a global that makes it quote everything. A better solution is to probably get it to quote the value as appropriate to what it is instead ... Here is the output when the variable added via the patch is set to true: hal9000:~ dmuey$ perl -MJSON::Syck -e '$JSON::Syck::PreferString=1;print JSON::Syck::Dump(\@ARGV)."\n";print JSON::Syck::Dump([sort { $a cmp $b } @ARGV])."\n";print JSON::Syck::Dump(\@ARGV)."\n";' 2 1 Howdy ["2","1","Howdy"] ["1","2","Howdy"] ["2","1","Howdy"] hal9000:~ dmuey$ perl -MJSON::Syck -e '$JSON::Syck::PreferString=1;print JSON::Syck::Dump(\@ARGV)."\n";print JSON::Syck::Dump([sort { $a <=> $b } @ARGV])."\n";print JSON::Syck::Dump(\@ARGV)."\n";' 2 1 Howdy ["2","1","Howdy"] ["Howdy","1","2"] ["2","1","Howdy"] hal9000:~ dmuey$
Subject: YAML-Syck.patch
diff -ruN YAML-Syck-1.07/perl_common.h YAML-Syck-1.07.fixed/perl_common.h --- YAML-Syck-1.07/perl_common.h 2008-04-19 14:08:31.000000000 -0500 +++ YAML-Syck-1.07.fixed/perl_common.h 2009-10-05 14:33:00.000000000 -0500 @@ -36,6 +36,7 @@ char* tag; char dump_code; bool implicit_binary; + bool prefer_string; }; struct parser_xtra { diff -ruN YAML-Syck-1.07/perl_syck.h YAML-Syck-1.07.fixed/perl_syck.h --- YAML-Syck-1.07/perl_syck.h 2008-06-08 12:53:09.000000000 -0500 +++ YAML-Syck-1.07.fixed/perl_syck.h 2009-10-05 14:35:27.000000000 -0500 @@ -740,6 +740,7 @@ SV* sv = (SV*)data; struct emitter_xtra *bonus = (struct emitter_xtra *)e->bonus; char* tag = bonus->tag; + char prefer_string = bonus->prefer_string; svtype ty = SvTYPE(sv); #ifndef YAML_IS_JSON char dump_code = bonus->dump_code; @@ -860,7 +861,8 @@ /* emit an undef (typically pointed from a blesed SvRV) */ syck_emit_scalar(e, OBJOF("str"), scalar_plain, 0, 0, 0, NULL_LITERAL, NULL_LITERAL_LENGTH); } - else if (SvNIOKp(sv) && (sv_len(sv) != 0)) { + else if ( (!prefer_string || (sv_len(sv) == 0) ) && SvNIOKp(sv) && (sv_len(sv) != 0) ) { + /* if prefer_string is on only emit */ /* emit a number with a stringified version */ syck_emit_scalar(e, OBJOF("str"), SCALAR_NUMBER, 0, 0, 0, SvPV_nolen(sv), sv_len(sv)); } @@ -1074,6 +1076,7 @@ SV *headless = GvSV(gv_fetchpv(form("%s::Headless", PACKAGE_NAME), TRUE, SVt_PV)); SV *implicit_unicode = GvSV(gv_fetchpv(form("%s::ImplicitUnicode", PACKAGE_NAME), TRUE, SVt_PV)); SV *implicit_binary = GvSV(gv_fetchpv(form("%s::ImplicitBinary", PACKAGE_NAME), TRUE, SVt_PV)); + SV *prefer_string = GvSV(gv_fetchpv(form("%s::PreferString", PACKAGE_NAME), TRUE, SVt_PV)); SV *use_code = GvSV(gv_fetchpv(form("%s::UseCode", PACKAGE_NAME), TRUE, SVt_PV)); SV *dump_code = GvSV(gv_fetchpv(form("%s::DumpCode", PACKAGE_NAME), TRUE, SVt_PV)); SV *sortkeys = GvSV(gv_fetchpv(form("%s::SortKeys", PACKAGE_NAME), TRUE, SVt_PV)); @@ -1111,6 +1114,7 @@ *(bonus.tag) = '\0'; bonus.dump_code = SvTRUE(use_code) || SvTRUE(dump_code); bonus.implicit_binary = SvTRUE(implicit_binary); + bonus.prefer_string = SvTRUE(prefer_string); emitter->bonus = &bonus; syck_emitter_handler( emitter, PERL_SYCK_EMITTER_HANDLER );
(This is a form-reply that isn't specific to your particular report) YAML::Syck has just acquired one new maintainer (me), it still doesn't have anyone that *cares* about it. But I'm willing to help solve your report & release a new version with the fix if it's easy for me. It now has a Git repository at: http://github.com/avar/YAML-Syck If your report is a patch that fixes a problem, great. Please remake the patch against Git by forking that repo and sending me a pull request on GitHub (or an update to this bug if you prefer git-format-patch(1) or some other repo provider..). Make sure to include a test for what you fixed. If your report is some code that fails (and you have a testcase for it) a patch against the test suite to demonstrate that failure would be very useful. It's OK if the test crashes and burns, see Test::More's docs for how to make TODO tests that fail now, but shouldn't. Even if it segfaults perl C<system $^X => qw/ -Mblib -MYAML::Syck .../> or something like that and checking the return value will do.
This is also on github. The specific commit to fix this is also here.
Subject: patch.txt
Adding some tests revealed some problems. The patch2 now includes working tests. The updated check was with p5p's advice on IRC. github is updated
Subject: patch2.txt
commit 7b3e88cc9e42e630150ba0f9b9618a30fb7adf28 Author: Todd Rinaldo <toddr@cpan.org> Date: Thu Jul 15 14:12:00 2010 -0500 RT 50227 - Not quoting strings when data has been sorted numerically before being passed to JSON::Syck::Dump Use the perl function looks_like_number to detect if it's a number, but only if the number isn't octal or hex since it'll convert it if you do that. diff --git a/perl_syck.h b/perl_syck.h index a798705..836157c 100644 --- a/perl_syck.h +++ b/perl_syck.h @@ -860,8 +860,8 @@ yaml_syck_emitter_handler /* emit an undef (typically pointed from a blesed SvRV) */ syck_emit_scalar(e, OBJOF("str"), scalar_plain, 0, 0, 0, NULL_LITERAL, NULL_LITERAL_LENGTH); } - else if (SvNIOKp(sv) && (sv_len(sv) != 0)) { - /* emit a number with a stringified version */ + else if (looks_like_number(sv) && (sv_len(sv) < 2 || SvPVX(sv)[0] != '0' || SvPVX(sv)[1] == '.')) { + /* emit a number only if it looks like one and isn't octal or hex. We're treating octal/hex as strings */ syck_emit_scalar(e, OBJOF("str"), SCALAR_NUMBER, 0, 0, 0, SvPV_nolen(sv), sv_len(sv)); } else if (SvPOKp(sv)) { diff --git a/t/json-numbers.t b/t/json-numbers.t new file mode 100644 index 0000000..582cd5c --- /dev/null +++ b/t/json-numbers.t @@ -0,0 +1,19 @@ +use Test::More tests => 6; + +use JSON::Syck qw(Dump); + +my @arr1 = sort {$a cmp $b} qw/1 2 54 howdy/; +is(Dump(\@arr1), '[1,2,54,"howdy"]', "cmp sort doesn't coerce numbers into strings"); + +{ + no warnings "numeric"; + my @arr2 = sort {$a <=> $b} qw/1 2 54 howdy/; + is(Dump(\@arr2), '["howdy",1,2,54]', "Numeric sort doesn't coerce non-numeric strings into numbers"); +} + +my @arr54 = ("howdy",1,2,54); +is(Dump(\@arr54), '["howdy",1,2,54]', "Strings are quoted. Numbers are not"); + +is(Dump('042'), '"042"', "Ocatls don't get treated as numbers"); +is(Dump('0x42'), '"0x42"', "Hex doesn't get treated as a number"); +is(Dump('0.42'), '0.42', "Floats with leading 0 don't get excluded by octal check");
More tests show perl will consider and convert scientific notation where it shouldn't. I had to take a more combined approach to fix this. Patch 3 should do it. github is also updated.
Subject: patch3.txt
commit b33458fa0b17f690ab5c2c5f74ca3729a5614205 Author: Todd Rinaldo <toddr@cpan.org> Date: Thu Jul 15 14:12:00 2010 -0500 RT 50227 - Not quoting strings when data has been sorted numerically before being passed to JSON::Syck::Dump Use the perl function looks_like_number to detect if it's a number. Then come behind and exclude hex/oct/scientific numbers with custom routine. The special numbers don't round trip properly. diff --git a/perl_syck.h b/perl_syck.h index a798705..48d64f7 100644 --- a/perl_syck.h +++ b/perl_syck.h @@ -860,8 +860,8 @@ yaml_syck_emitter_handler /* emit an undef (typically pointed from a blesed SvRV) */ syck_emit_scalar(e, OBJOF("str"), scalar_plain, 0, 0, 0, NULL_LITERAL, NULL_LITERAL_LENGTH); } - else if (SvNIOKp(sv) && (sv_len(sv) != 0)) { - /* emit a number with a stringified version */ + else if (looks_like_number(sv) && syck_is_a_number(SvPV_nolen(sv), sv_len(sv))) { + /* emit a number only if it's [.0-9]+ and isn't octal or hex. We're treating octal/hex as strings */ syck_emit_scalar(e, OBJOF("str"), SCALAR_NUMBER, 0, 0, 0, SvPV_nolen(sv), sv_len(sv)); } else if (SvPOKp(sv)) { diff --git a/syck_.c b/syck_.c index b5ce707..2793fb0 100644 --- a/syck_.c +++ b/syck_.c @@ -2,7 +2,7 @@ * syck.c * * $Author: why $ - * $Date: 2005-01-01 10:06:25 +0800 (六, 01 1 2005) $ + * $Date: 2005-01-01 10:06:25 +0800 (.., 01 1 2005) $ * * Copyright (C) 2003 why the lucky stiff */ @@ -496,9 +496,34 @@ syck_parse( SyckParser *p ) void syck_default_error_handler( SyckParser *p, char *msg ) { - printf( "Error at [Line %d, Col %d]: %s\n", + printf( "Error at [Line %d, Col %d]: %s\n", p->linect, p->cursor - p->lineptr, msg ); } +int syck_is_a_number(char* str, long len) { + int idx; + char seen_decimal = 0; + char c; + + if(!str) return 0; /* Don't parse null strings */ + if(len < 1) return 0; /* empty strings can't be numbers */ + + /* Look for illegal characters */ + for ( idx = 0; idx < len; idx++ ) { + c = str[idx]; + + if(c > '9' || c < '.' || c == '/' ) + return 0; /* Number can only have 0..9 and . */ + if(c == '.') { + if(seen_decimal) return 0; /* Numbers can't have 2 .'s in them */ + if(idx == 0) return 0; /* has to have a leading 0 at least for floats */ + seen_decimal = 1; + } + } + + if(len > 1 && str[0] == '0' && str[1] != '.') return 0; /* Trap hex/octal */ + + return 1; +} diff --git a/t/2-scalars.t b/t/2-scalars.t index efb0741..69d797a 100644 --- a/t/2-scalars.t +++ b/t/2-scalars.t @@ -1,4 +1,4 @@ -use t::TestYAML tests => 81; +use t::TestYAML tests => 88; local $SIG{__WARN__} = sub { 1 } if $Test::VERSION < 1.20; @@ -168,6 +168,14 @@ roundtrip("\n"); roundtrip("S p a c e"); roundtrip("Space \n Around"); +roundtrip("042"); +roundtrip("0x42"); +roundtrip("0.42"); +roundtrip(".42"); +roundtrip("1,000,000"); +roundtrip("1e+6"); +roundtrip("10e+6"); + # If implicit typing is on, quote strings corresponding to implicit boolean and null values $YAML::Syck::SingleQuote = 0; diff --git a/t/json-numbers.t b/t/json-numbers.t new file mode 100644 index 0000000..6d790e5 --- /dev/null +++ b/t/json-numbers.t @@ -0,0 +1,25 @@ +use Test::More tests => 10; + +use JSON::Syck qw(Dump); + +my @arr1 = sort {$a cmp $b} qw/1 2 54 howdy/; +is(Dump(\@arr1), '[1,2,54,"howdy"]', "cmp sort doesn't coerce numbers into strings"); + +{ + no warnings "numeric"; + my @arr2 = sort {$a <=> $b} qw/1 2 54 howdy/; + is(Dump(\@arr2), '["howdy",1,2,54]', "Numeric sort doesn't coerce non-numeric strings into numbers"); +} + +my @arr54 = ("howdy",1,2,54); +is(Dump(\@arr54), '["howdy",1,2,54]', "Strings are quoted. Numbers are not"); + +is(Dump('042'), '"042"', "Ocatls don't get treated as numbers"); +is(Dump('0x42'), '"0x42"', "Hex doesn't get treated as a number"); +is(Dump('0.42'), '0.42', "Floats with leading 0 don't get excluded by octal check"); +is(Dump('1_000_000'), '"1_000_000"', "numbers with underscores get quoted."); +is(Dump('1,000,000'), '"1,000,000"', "numbers with commas get quoted."); +is(Dump('1e+6'), '"1e+6"', "Scientific notation gets quoted."); +is(Dump('10e+6'), '"10e+6"', "Scientific notation gets quoted."); + +
Further refinements were required. 1.10_02 pushed to pause.