Subject: | Bug Report: YAML-0.71 |
Date: | Fri, 20 May 2011 15:18:39 -0400 |
To: | bug-YAML [...] rt.cpan.org |
From: | Devon <decasm [...] gmail.com> |
Hi.
Thanks for writing and distributing the YAML module. I think I've
found a bug. I'm still using 0.71, but the code I'm looking at is the
same in .72/.73. When I have a file with an anchor like "an_anchor" -
with an underscore - the Loader gives a YAML_PARSE_ERR_BAD_ANCHOR.
Looking into it, it looks like an underscore is ok by the spec, but
the perl code is more constrained. Relevant parts of the spec and
Loader.pm are included below. I think it would be an improvement, if
not technically correct, to modify the regular expressions used for
checking anchors/aliases as show below. I've attached a ysh log file
that demonstrates the error.
Thanks again,
Devon Smith
perl -v = This is perl 5, version 12, subversion 3 (v5.12.3) built for
x86_64-linux-thread-multi
uname -a = Linux smithde 2.6.38-ARCH #1 SMP PREEMPT Fri May 13
09:24:47 CEST 2011 x86_64 Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
GenuineIntel GNU/Linux
YAML/Loader.pm:
233 elsif ($preface =~ s/^\&([^ ,:]+)\s*//) {
234 $token = $1;
235 $self->die('YAML_PARSE_ERR_BAD_ANCHOR')
236 unless $token =~
/^[\x21-\x2b\x2d-\x5a\x5c\x5e-\x7a\x7c\x7e]+$/; # skip over x2c,
x5b, x5d, x7b, x7d
237 # unless $token =~ /^[a-zA-Z0-9]+$/;
238 $self->die('YAML_PARSE_ERR_MANY_ANCHOR') if $anchor;
239 $self->die('YAML_PARSE_ERR_ANCHOR_ALIAS') if $alias;
240 $anchor = $token;
241 }
242 elsif ($preface =~ s/^\*([^ ,:]+)\s*//) {
243 $token = $1;
244 $self->die('YAML_PARSE_ERR_BAD_ALIAS')
245 unless $token =~
/^[\x21-\x2b\x2d-\x5a\x5c\x5e-\x7a\x7c\x7e]+$/;
246 # unless $token =~ /^[a-zA-Z0-9]+$/;
247 $self->die('YAML_PARSE_ERR_MANY_ALIAS') if $alias;
248 $self->die('YAML_PARSE_ERR_ANCHOR_ALIAS') if $anchor;
249 $alias = $token;
250 }
http://yaml.org/spec/1.2/spec.html#ns-anchor-name
[103] ns-anchor-name ::= ns-anchor-char+
[102] ns-anchor-char ::= ns-char - c-flow-indicator
[34] ns-char ::= nb-char - s-white
[33] s-white ::= s-space | s-tab
[32] s-tab ::= #x9 /* TAB */
[31] s-space ::= #x20 /* SP */
[27] nb-char ::= c-printable - b-char - c-byte-order-mark
[23] c-flow-indicator ::= “,” | “[” | “]” | “{” | “}”
[26] b-char ::= b-line-feed | b-carriage-return
[25] b-carriage-return ::= #xD /* CR */
[24] b-line-feed ::= #xA /* LF */
[3] c-byte-order-mark ::= #xFEFF
[1] c-printable ::= #x9 | #xA | #xD | [#x20-#x7E] /* 8 bit */
| #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] /* 16 bit */
| [#x10000-#x10FFFF] /* 32 bit */
Message body is not shown because sender requested not to inline it.