Skip Menu |

This queue is for tickets about the Parse-Yapp CPAN distribution.

Report information
The Basics
Id: 122977
Status: open
Priority: 0/
Queue: Parse-Yapp

People
Owner: Nobody in particular
Requestors: tmp11590 [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Sort hash keys for deterministic output - Parse-Yapp-1.21
Date: Wed, 6 Sep 2017 17:11:25 -0700
To: bug-Parse-Yapp [...] rt.cpan.org
From: Tom Pollard <tmp11590 [...] gmail.com>
Hi, I noticed the output of yapp is not deterministic because hash keys are not sorted. This makes it difficult (and in most cases impossible) to compare the output of yapp after making minor changes to the input file. This issue is easily fixed by adding "sort()" around all calls to "keys()" that affect yapp's output. I've made this change to my local install of Parse::Yapp and haven't noticed any significant slowdown due to the sorting. Thanks, Tom
Which output ? Why not sorting hash keys would make generated parser not deterministic ?
Subject: Re: [rt.cpan.org #122977] Sort hash keys for deterministic output - Parse-Yapp-1.21
Date: Thu, 7 Sep 2017 09:42:45 -0700
To: bug-Parse-Yapp [...] rt.cpan.org
From: Tom Pollard <tmp11590 [...] gmail.com>
Show quoted text
>>> Which output ?
The output .pm file Show quoted text
>>> Why not sorting hash keys would make generated parser not deterministic
? Starting with Perl 5.18 (I think), hash key ordering is completely random. You can try this to reproduce the issue that I see. I'm using the file XSP.yp from this release of ExtUtils-XSpp as a test. http://search.cpan.org/CPAN/authors/id/S/SM/SMUELLER/ExtUtils-XSpp-0.18.tar.gz Again, Perl >= 5.18 is required to reproduce the error: Show quoted text
> perl -v
This is perl 5, version 24, subversion 2 (v5.24.2) built for x86_64-linux ... Show quoted text
> yapp -V
This is Parse::Yapp version 1.2. Show quoted text
> yapp -o first.pm XSP.yp > yapp -o second.pm XSP.yp > diff first.pm second.pm | wc -l
3245 Now, if I add sort() around these instances of keys() in these 3 files, and run the experiment again: Parse/Yapp/Driver.pm:162: return sort(keys(%{$self->{STATES}[$self->{STACK}[-1][0]]{ACTIONS}})); Parse/Yapp/Grammar.pm:343: for my $sym (sort(keys(%$term))) { Parse/Yapp/Grammar.pm:355: for my $sym (sort(keys(%$nterm))) { Parse/Yapp/Lalr.pm:135: for (sort(keys(%{$$states[$stateno]{ACTIONS}}))) { Parse/Yapp/Lalr.pm:223: for (sort(keys(%{$$states[$stateno]{GOTOS}}))) { Parse/Yapp/Lalr.pm:339: } grep { $_ } sort(keys(%{$$state{ACTIONS}}))); Parse/Yapp/Lalr.pm:362: } sort(keys(%{$$state{GOTOS}}))); Parse/Yapp/Lalr.pm:415: for $y (sort(keys(%{$$rel{$x}}))) { Parse/Yapp/Lalr.pm:438: for (sort(keys(%$rel))) { Parse/Yapp/Lalr.pm:462: for my $symbol (sort(keys(%{$$grammar{NTERM}}))) { Parse/Yapp/Lalr.pm:514: for (sort(keys(%transitions))) { Parse/Yapp/Lalr.pm:576: for my $symbol (sort(keys(%{$$grammar{NTERM}}))) { Parse/Yapp/Lalr.pm:624: [ sort( keys( %{ +{ map { ($_,1) } @$preds } } ) ) ]; Parse/Yapp/Lalr.pm:697: for my $symbol (sort(keys(%{$$state{GOTOS}}))) { Parse/Yapp/Lalr.pm:734: my($termlst)= [ '',sort(keys(%{$$grammar{TERM}})) ]; Parse/Yapp/Lalr.pm:738: for my $stateno ( sort(keys(%$inconsistent)) ) { Parse/Yapp/Lalr.pm:797: for my $stateno (sort(keys(%$inconsistent))) { Parse/Yapp/Lalr.pm:802: for my $term ( sort(keys(%$actions)) ) { Parse/Yapp/Lalr.pm:902: for my $term (sort(keys(%$actions))) { Parse/Yapp/Lalr.pm:920: sort(keys(%reduces)))[0]; Show quoted text
> yapp -o first.pm XSP.yp > yapp -o second.pm XSP.yp > diff first.pm second.pm | wc -l
0 I don't know if all the sort()'s are necessary, but adding all of them was the quickest/easiest way for me to get deterministic output. On Thu, Sep 7, 2017 at 8:00 AM, Francois Desarmenien via RT < bug-Parse-Yapp@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=122977 > > > Which output ? > Why not sorting hash keys would make generated parser not deterministic ? >
Sorry, but your message is unreadable, so I cannot get your example (files content mixup with text with no '\n'). But the order in which the keys of a hash array are stored internally is irrelevant, so is the order in which appear key/value pairs while feeding an hash array...
From: tmp11590 [...] gmail.com
Let me try reformatting my original response. --- Which output ? The output .pm file --- Why not sorting hash keys would make generated parser not deterministic ? Starting with Perl 5.18 (I think), hash key ordering is completely random. You can try this to reproduce the issue that I see. I'm using the file XSP.yp from this release of ExtUtils-XSpp as a test. You can download the whole package from here and extract XSP.yp, or I also attached the file to this reply. http://search.cpan.org/CPAN/authors/id/S/SM/SMUELLER/ExtUtils-XSpp-0.18.tar.gz Again, Perl >= 5.18 is required to reproduce the error: :> perl -v This is perl 5, version 24, subversion 2 (v5.24.2) built for x86_64-linux ... :> yapp -V This is Parse::Yapp version 1.2. :> yapp -o first.pm XSP.yp :> yapp -o second.pm XSP.yp :> diff first.pm second.pm | wc -l 3245 Now, if I add sort() around these instances of keys() in these 3 files, and run the experiment again: Parse/Yapp/Driver.pm:162: return sort(keys(%{$self->{STATES}[$self->{STACK}[-1][0]]{ACTIONS}})); Parse/Yapp/Grammar.pm:343: for my $sym (sort(keys(%$term))) { Parse/Yapp/Grammar.pm:355: for my $sym (sort(keys(%$nterm))) { Parse/Yapp/Lalr.pm:135: for (sort(keys(%{$$states[$stateno]{ACTIONS}}))) { Parse/Yapp/Lalr.pm:223: for (sort(keys(%{$$states[$stateno]{GOTOS}}))) { Parse/Yapp/Lalr.pm:339: } grep { $_ } sort(keys(%{$$state{ACTIONS}}))); Parse/Yapp/Lalr.pm:362: } sort(keys(%{$$state{GOTOS}}))); Parse/Yapp/Lalr.pm:415: for $y (sort(keys(%{$$rel{$x}}))) { Parse/Yapp/Lalr.pm:438: for (sort(keys(%$rel))) { Parse/Yapp/Lalr.pm:462: for my $symbol (sort(keys(%{$$grammar{NTERM}}))) { Parse/Yapp/Lalr.pm:514: for (sort(keys(%transitions))) { Parse/Yapp/Lalr.pm:576: for my $symbol (sort(keys(%{$$grammar{NTERM}}))) { Parse/Yapp/Lalr.pm:624: [ sort( keys( %{ +{ map { ($_,1) } @$preds } } ) ) ]; Parse/Yapp/Lalr.pm:697: for my $symbol (sort(keys(%{$$state{GOTOS}}))) { Parse/Yapp/Lalr.pm:734: my($termlst)= [ '',sort(keys(%{$$grammar{TERM}})) ]; Parse/Yapp/Lalr.pm:738: for my $stateno ( sort(keys(%$inconsistent)) ) { Parse/Yapp/Lalr.pm:797: for my $stateno (sort(keys(%$inconsistent))) { Parse/Yapp/Lalr.pm:802: for my $term ( sort(keys(%$actions)) ) { Parse/Yapp/Lalr.pm:902: for my $term (sort(keys(%$actions))) { Parse/Yapp/Lalr.pm:920: sort(keys(%reduces)))[0]; :> yapp -o first.pm XSP.yp :> yapp -o second.pm XSP.yp :> diff first.pm second.pm | wc -l 0 I don't know if all the sort()'s are necessary, but adding all of them was the quickest/easiest way for me to get deterministic output.
Subject: XSP.yp
Download XSP.yp
application/octet-stream 16.9k

Message body not shown because it is not plain text.

The sorting you added is useless : the generated source files are maybe slightly different but their compiled form will run exactly the same way.
Am Do 07. Sep 2017, 18:28:09, FDESAR schrieb: Show quoted text
> The sorting you added is useless : the generated source files are > maybe slightly different but their compiled form will run exactly the > same way.
And that is exactly what Tom initially wrote. Yapp's output slightly differs, whenever you recompile the grammar, even when the grammar has not changed. The generated code always behaves the same but that is not the problem. The problem is: Humans want to make diffs. And the diffs should show relevant changes and not variations that have no impact on the behavior of the software. Likewise, authors may decide to put the output of yapp under version control although it is generated. But that causes headaches because your local repository gets dirty whenever you re-compile your .y file(s). Yacc and Bison always create exactly the same output if the grammar is unchanged, and that avoids this sort of problems.
I forgot the most important argument: The .output file is also non-deterministic. Rule and state numbers change in arbitrary ways, whenever you recompile the grammar. Even after tiny changes to the grammar, you always have to start debugging from scratch because you know that state 275 that you have just fixed is almost certainly no longer state 275 after the next try. When you developed yapp, Perl's behavior was different and you probably never experienced that problem. But since Perl 5.18, it has become a little bit of a nuisance.
On Thu Sep 07 18:28:09 2017, FDESAR wrote: Show quoted text
> The sorting you added is useless : the generated source files are > maybe slightly different but their compiled form will run exactly the > same way.
Parse::Yapp is used in parser generation in a bunch of packages for Debian OS. Debian requires each package be able to build reproducibly bit-by-bit, therefore, deterministic output of Parse::Yapp is essential. For this reason I have made a patch resolving the issue, please find it attached. In addition, please find a test case (in a form of a patch) ensuring bit-by-bit reproducibility of Parse::Yapp-generated parser (depends on Text::Diff).
Subject: reproducibility.t.patch
--- /dev/null +++ b/t/reproducibility.t @@ -0,0 +1,412 @@ +#!/usr/bin/perl + +use strict; +use warnings; +use Parse::Yapp; +use Text::Diff; + +BEGIN { $| = 1; print "1..1\n"; } + +open( my $inp, 'Calc.yp' ); +my $grammar = join '', <$inp>; +close $inp; + +my $parser = new Parse::Yapp( input => $grammar ); + +my $output_new = $parser->Output( classname => 'Calc' ); +$output_new =~ s/(Parse::Yapp version )\d+\.\d+/$1<version>/; +$output_new =~ s/(yyversion => ')[^']+'/$1<version>'/; + +my $output_old = join '', <DATA>; +close DATA; + +my $diff = diff \$output_old, \$output_new; +if( $diff ) { + print "not ok 1\n"; + print "Non-zero diff:\n" . $diff; + exit(1); +} else { + print "ok 1\n"; +} + +__DATA__ +#################################################################### +# +# This file was generated using Parse::Yapp version <version>. +# +# Don't edit this file, use source file instead. +# +# ANY CHANGE MADE HERE WILL BE LOST ! +# +#################################################################### +package Calc; +use vars qw ( @ISA ); +use strict; + +@ISA= qw ( Parse::Yapp::Driver ); +use Parse::Yapp::Driver; + + + +sub new { + my($class)=shift; + ref($class) + and $class=ref($class); + + my($self)=$class->SUPER::new( yyversion => '<version>', + yystates => +[ + {#State 0 + DEFAULT => -1, + GOTOS => { + 'input' => 1 + } + }, + {#State 1 + ACTIONS => { + '' => 2, + "(" => 3, + "-" => 4, + "\n" => 5, + 'NUM' => 6, + 'VAR' => 7, + 'error' => 8 + }, + GOTOS => { + 'exp' => 9, + 'line' => 10 + } + }, + {#State 2 + DEFAULT => 0 + }, + {#State 3 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 11 + } + }, + {#State 4 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 12 + } + }, + {#State 5 + DEFAULT => -3 + }, + {#State 6 + DEFAULT => -6 + }, + {#State 7 + ACTIONS => { + "=" => 13 + }, + DEFAULT => -7 + }, + {#State 8 + ACTIONS => { + "\n" => 14 + } + }, + {#State 9 + ACTIONS => { + "*" => 15, + "+" => 16, + "-" => 17, + "/" => 18, + "\n" => 19, + "^" => 20 + } + }, + {#State 10 + DEFAULT => -2 + }, + {#State 11 + ACTIONS => { + ")" => 21, + "*" => 15, + "+" => 16, + "-" => 17, + "/" => 18, + "^" => 20 + } + }, + {#State 12 + ACTIONS => { + "^" => 20 + }, + DEFAULT => -13 + }, + {#State 13 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 22 + } + }, + {#State 14 + DEFAULT => -5 + }, + {#State 15 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 23 + } + }, + {#State 16 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 24 + } + }, + {#State 17 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 25 + } + }, + {#State 18 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 26 + } + }, + {#State 19 + DEFAULT => -4 + }, + {#State 20 + ACTIONS => { + "(" => 3, + "-" => 4, + 'NUM' => 6, + 'VAR' => 7 + }, + GOTOS => { + 'exp' => 27 + } + }, + {#State 21 + DEFAULT => -15 + }, + {#State 22 + ACTIONS => { + "*" => 15, + "+" => 16, + "-" => 17, + "/" => 18, + "^" => 20 + }, + DEFAULT => -8 + }, + {#State 23 + ACTIONS => { + "^" => 20 + }, + DEFAULT => -11 + }, + {#State 24 + ACTIONS => { + "*" => 15, + "/" => 18, + "^" => 20 + }, + DEFAULT => -9 + }, + {#State 25 + ACTIONS => { + "*" => 15, + "/" => 18, + "^" => 20 + }, + DEFAULT => -10 + }, + {#State 26 + ACTIONS => { + "^" => 20 + }, + DEFAULT => -12 + }, + {#State 27 + ACTIONS => { + "^" => 20 + }, + DEFAULT => -14 + } +], + yyrules => +[ + [#Rule 0 + '$start', 2, undef + ], + [#Rule 1 + 'input', 0, undef + ], + [#Rule 2 + 'input', 2, +sub +#line 17 "unkown" +{ push(@{$_[1]},$_[2]); $_[1] } + ], + [#Rule 3 + 'line', 1, +sub +#line 20 "unkown" +{ $_[1] } + ], + [#Rule 4 + 'line', 2, +sub +#line 21 "unkown" +{ print "$_[1]\n" } + ], + [#Rule 5 + 'line', 2, +sub +#line 22 "unkown" +{ $_[0]->YYErrok } + ], + [#Rule 6 + 'exp', 1, undef + ], + [#Rule 7 + 'exp', 1, +sub +#line 26 "unkown" +{ $_[0]->YYData->{VARS}{$_[1]} } + ], + [#Rule 8 + 'exp', 3, +sub +#line 27 "unkown" +{ $_[0]->YYData->{VARS}{$_[1]}=$_[3] } + ], + [#Rule 9 + 'exp', 3, +sub +#line 28 "unkown" +{ $_[1] + $_[3] } + ], + [#Rule 10 + 'exp', 3, +sub +#line 29 "unkown" +{ $_[1] - $_[3] } + ], + [#Rule 11 + 'exp', 3, +sub +#line 30 "unkown" +{ $_[1] * $_[3] } + ], + [#Rule 12 + 'exp', 3, +sub +#line 31 "unkown" +{ + $_[3] + and return($_[1] / $_[3]); + $_[0]->YYData->{ERRMSG} + = "Illegal division by zero.\n"; + $_[0]->YYError; + undef + } + ], + [#Rule 13 + 'exp', 2, +sub +#line 39 "unkown" +{ -$_[2] } + ], + [#Rule 14 + 'exp', 3, +sub +#line 40 "unkown" +{ $_[1] ** $_[3] } + ], + [#Rule 15 + 'exp', 3, +sub +#line 41 "unkown" +{ $_[2] } + ] +], + @_); + bless($self,$class); +} + +#line 44 "unkown" + + +sub _Error { + exists $_[0]->YYData->{ERRMSG} + and do { + print $_[0]->YYData->{ERRMSG}; + delete $_[0]->YYData->{ERRMSG}; + return; + }; + print "Syntax error.\n"; +} + +sub _Lexer { + my($parser)=shift; + + $parser->YYData->{INPUT} + or $parser->YYData->{INPUT} = <STDIN> + or return('',undef); + + $parser->YYData->{INPUT}=~s/^[ \t]//; + + for ($parser->YYData->{INPUT}) { + s/^([0-9]+(?:\.[0-9]+)?)// + and return('NUM',$1); + s/^([A-Za-z][A-Za-z0-9_]*)// + and return('VAR',$1); + s/^(.)//s + and return($1,$1); + } +} + +sub Run { + my($self)=shift; + $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error ); +} + +my($calc)=new Calc; +$calc->Run; + +1;
Subject: sort-hash-keys.patch
Description: sorting hash keys to remove non-determinism from the generated parsers. Bug: https://rt.cpan.org/Public/Bug/Display.html?id=122977 Bug-Debian: http://bugs.debian.org/903979 Forwarded: no Author: Andrius Merkys <andrius.merkys@gmail.com> Last-Update: 2018-07-19 --- a/lib/Parse/Yapp/Driver.pm +++ b/lib/Parse/Yapp/Driver.pm @@ -159,7 +159,7 @@ sub YYExpect { my($self)=shift; - keys %{$self->{STATES}[$self->{STACK}[-1][0]]{ACTIONS}} + sort keys %{$self->{STATES}[$self->{STACK}[-1][0]]{ACTIONS}} } sub YYLexer { --- a/lib/Parse/Yapp/Grammar.pm +++ b/lib/Parse/Yapp/Grammar.pm @@ -340,7 +340,7 @@ $reachable = _Reachable($rules,$nterm,$term,$ufrules,$ufnterm); $$grammar{TERM}{chr(0)}=undef; - for my $sym (keys %$term) { + for my $sym (sort keys %$term) { ( exists($$reachable{$sym}) or exists($values->{PREC}{$sym}) ) and do { @@ -352,7 +352,7 @@ } $$grammar{NTERM}{'$start'}=[]; - for my $sym (keys %$nterm) { + for my $sym (sort keys %$nterm) { exists($$reachable{$sym}) and do { exists($values->{NULL}{$sym}) --- a/lib/Parse/Yapp/Lalr.pm +++ b/lib/Parse/Yapp/Lalr.pm @@ -132,7 +132,7 @@ } #Prepare Actions - for (keys(%{$$states[$stateno]{ACTIONS}})) { + for (sort keys(%{$$states[$stateno]{ACTIONS}})) { my($term,$action)=($_,$$states[$stateno]{ACTIONS}{$_}); $term eq chr(0) @@ -220,7 +220,7 @@ exists($$states[$stateno]{GOTOS}) and do { $text.= "\n"; - for (keys(%{$$states[$stateno]{GOTOS}})) { + for (sort keys(%{$$states[$stateno]{GOTOS}})) { $text.= "\t$_\tgo to state $$states[$stateno]{GOTOS}{$_}\n"; } }; @@ -336,7 +336,7 @@ "$term => $action"; - } grep { $_ } keys(%{$$state{ACTIONS}})); + } grep { $_ } sort keys(%{$$state{ACTIONS}})); $text.="\n\t\t}"; }; @@ -359,7 +359,7 @@ "'$nterm' => $stateno"; - } keys(%{$$state{GOTOS}})); + } sort keys(%{$$state{GOTOS}})); $text.="\n\t\t}"; }; @@ -412,7 +412,7 @@ exists($$rel{$x}) and do { - for $y (keys(%{$$rel{$x}})) { + for $y (sort keys(%{$$rel{$x}})) { exists($N{$y}) or &$Traverse($y,$d+1); @@ -435,7 +435,7 @@ }; }; - for (keys(%$rel)) { + for (sort keys(%$rel)) { exists($N{$_}) or &$Traverse($_,1); } @@ -459,7 +459,7 @@ my($grammar)=@_; my($rel,$closures); - for my $symbol (keys(%{$$grammar{NTERM}})) { + for my $symbol (sort keys(%{$$grammar{NTERM}})) { $closures->{$symbol}=pack('b'.@{$$grammar{RULES}}); for my $ruleno (@{$$grammar{NTERM}{$symbol}}) { @@ -511,7 +511,7 @@ push(@{$transitions{$$rhs[$pos]}},[ $ruleno, $pos+1 ]); } - for (keys(%transitions)) { + for (sort keys(%transitions)) { my($symbol,$core)=($_,$transitions{$_}); my($corekey)=join(',',map { join('.',@$_) } sort { $$a[0] <=> $$b[0] @@ -573,7 +573,7 @@ my($grammar,$termlst,$terminx)=@_; my($rel,$first)=( {}, {} ); - for my $symbol (keys(%{$$grammar{NTERM}})) { + for my $symbol (sort keys(%{$$grammar{NTERM}})) { $first->{$symbol}=pack('b'.@$termlst); RULE: @@ -621,7 +621,7 @@ } # Pass @$preds through a hash to ensure unicity - [ keys( %{ +{ map { ($_,1) } @$preds } } ) ]; + [ sort keys( %{ +{ map { ($_,1) } @$preds } } ) ]; } sub _FirstSfx { @@ -694,7 +694,7 @@ exists($$state{GOTOS}) or next; - for my $symbol (keys(%{$$state{GOTOS}})) { + for my $symbol (sort keys(%{$$state{GOTOS}})) { my($tostate)=$$states[$$state{GOTOS}{$symbol}]; my($goto)="$stateno.$symbol"; @@ -731,11 +731,11 @@ sub _ComputeLA { my($grammar,$states)=@_; - my($termlst)= [ '',keys(%{$$grammar{TERM}}) ]; + my($termlst)= [ '',sort keys(%{$$grammar{TERM}}) ]; my($follows,$inconsistent) = _ComputeFollows($grammar,$states,$termlst); - for my $stateno ( keys(%$inconsistent ) ) { + for my $stateno ( sort keys(%$inconsistent ) ) { my($state)=$$states[$stateno]; my($conflict); @@ -794,12 +794,12 @@ undef; }; - for my $stateno (keys(%$inconsistent)) { + for my $stateno (sort keys(%$inconsistent)) { my($state)=$$states[$stateno]; my($actions)=$$state{ACTIONS}; my($nbsr,$nbrr); - for my $term ( keys(%$actions) ) { + for my $term ( sort keys(%$actions) ) { my($act)=$$actions{$term}; @$act > 1 @@ -899,7 +899,7 @@ and $$actions{error}[0] > 0 and ++$nodefault; - for my $term (keys(%$actions)) { + for my $term (sort keys(%$actions)) { $$actions{$term}=$$actions{$term}[0]; @@ -917,7 +917,7 @@ $default=( map { $$_[0] } sort { $$b[1] <=> $$a[1] or $$b[0] <=> $$a[0] } map { [ $_, scalar(@{$reduces{$_}}) ] } - keys(%reduces))[0]; + sort(keys(%reduces)))[0]; delete(@$actions{ @{$reduces{$default}} }); $$state{ACTIONS}{''}=$default;