Skip Menu |

This queue is for tickets about the Parser-MGC CPAN distribution.

Report information
The Basics
Id: 66786
Status: resolved
Priority: 0/
Queue: Parser-MGC

People
Owner: Nobody in particular
Requestors: PSCUST [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.07
Fixed in: 0.08



Subject: Performance: where() called by fail() when backtracking
The fail() method calls where() to extract the current line number and the whole line where the error occurred. This information is needed for showing an error message to the user, but most times the error raised by fail() is captured and used for backtracking. The fact that where() is called unnecessarily causes a performance hit. There is a huge performance increase by postponing where() to when an error message needs to be generated. On a test program parsing a real program file of about 128K bytes, I got the following elapsed times: 13.97 s in the original 0.07 version of the module 1.77 s with the attached modification The patch also adds the file name to the error message, in case from_file() was called. This seams to be the intent of the 'filename' attribute, but it is not used. The xxerror.t tests this feature. Please feel free to include the patch in a next version of the module, or to do something completely different. Best regards, Paulo Custodio
Subject: Parser-MGC-0.07_02.patch
*** Parser-MGC-0.07_01\lib\Parser\MGC.pm 2011-03-19 14:20:16.000000000 +0000 --- Parser-MGC-0.07_02\lib\Parser\MGC.pm 2011-03-22 19:36:16.165379000 +0000 *************** *** 257,275 **** my $pos = pos $self->{str}; my $str = $self->{str}; ! my $sol = $pos; ! $sol-- if $sol > 0 and substr( $str, $sol, 1 ) =~ m/^[\r\n]$/; ! $sol-- while $sol > 0 and substr( $str, $sol-1, 1 ) !~ m/^[\r\n]$/; ! ! my $eol = $pos; ! $eol++ while $eol < length($str) and substr( $str, $eol, 1 ) !~ m/^[\r\n]$/; ! ! my $line = substr( $str, $sol, $eol - $sol ); ! ! my $col = $pos - $sol; ! my $lineno = ( () = substr( $str, 0, $pos ) =~ m/\n/g ) + 1; ! ! return ( $lineno, $col, $line ); } =head2 $parser->fail( $message ) --- 257,263 ---- my $pos = pos $self->{str}; my $str = $self->{str}; ! return _where($str, $pos); } =head2 $parser->fail( $message ) *************** *** 285,291 **** my $self = shift; my ( $message ) = @_; ! die Parser::MGC::Failure->new( $message, $self->where ); } =head2 $eos = $parser->at_eos --- 273,280 ---- my $self = shift; my ( $message ) = @_; ! die Parser::MGC::Failure->new( $message, $self->{str}, pos($self->{str}), ! $self->{filename} ); } =head2 $eos = $parser->at_eos *************** *** 880,885 **** --- 869,893 ---- return $kw; } + sub _where { + my($str, $pos) = @_; + + my $sol = $pos; + $sol-- if $sol > 0 and substr( $str, $sol, 1 ) =~ m/^[\r\n]$/; + $sol-- while $sol > 0 and substr( $str, $sol-1, 1 ) !~ m/^[\r\n]$/; + + my $eol = $pos; + $eol++ while $eol < length($str) and substr( $str, $eol, 1 ) !~ m/^[\r\n]$/; + + my $line = substr( $str, $sol, $eol - $sol ); + + my $col = $pos - $sol; + my $lineno = ( () = substr( $str, 0, $pos ) =~ m/\n/g ) + 1; + + return ( $lineno, $col, $line ); + } + + package # hide from indexer Parser::MGC::Failure; *************** *** 887,893 **** { my $class = shift; my $self = bless {}, $class; ! @{$self}{qw( message linenum col text )} = @_; return $self; } --- 895,901 ---- { my $class = shift; my $self = bless {}, $class; ! @{$self}{qw( message str pos filename )} = @_; return $self; } *************** *** 895,910 **** sub STRING { my $self = shift; # Column number only counts characters. There may be tabs in there. # Rather than trying to calculate the visual column number, just print the # indentation as it stands. ! my $indent = substr( $self->{text}, 0, $self->{col} ); $indent =~ s/[^ \t]/ /g; # blank out all the non-whitespace ! return "$self->{message} on line $self->{linenum} at:\n" . ! "$self->{text}\n" . "$indent^\n"; } --- 903,927 ---- sub STRING { my $self = shift; + + # compute where only when generating the error message, not when fail() + # is called, as it is an expensive operation and not needed if we are just + # backtracking + + my($linenum, $col, $text) = Parser::MGC::_where(@{$self}{qw( str pos )}); # Column number only counts characters. There may be tabs in there. # Rather than trying to calculate the visual column number, just print the # indentation as it stands. ! my $indent = substr( $text, 0, $col ); $indent =~ s/[^ \t]/ /g; # blank out all the non-whitespace ! return "$self->{message} on ". ! (defined($self->{filename}) && ! ref($self->{filename}) ! ? $self->{filename}." " : ""). ! "line $linenum at:\n" . ! "$text\n" . "$indent^\n"; }
Subject: xxerror.t
#!/usr/bin/perl -w use strict; use Test::More tests => 7; package TestParser; use base qw( Parser::MGC ); sub parse { my $self = shift; return $self->token_int; } package main; my $parser = TestParser->new; isa_ok( $parser, "TestParser", '$parser' ); isa_ok( $parser, "Parser::MGC", '$parser' ); my $value = $parser->from_string( "\t123" ); is( $value, 123, '->from_string' ); eval { $parser->from_string( "\t123." ) }; is( $@, "Expected end of input on line 1 at:\n". "\t123.\n". "\t ^\n", 'error with no file' ); eval { $parser->from_file( \*DATA ) }; is( $@, "Expected end of input on line 1 at:\n". " 123.\n". " ^\n", 'error with glob file' ); my $tmpfile = "test_error.tmp"; ok( open(my $fh, ">", $tmpfile), "create $tmpfile" ); print $fh " 123.\n"; close $fh; eval { $parser->from_file( $tmpfile ) }; is( $@, "Expected end of input on $tmpfile line 1 at:\n". " 123.\n". " ^\n", 'error with glob file' ); __DATA__ 123.
xxerror.t had a bug... temp file not removed. Fixed. On Tue Mar 22 16:04:09 2011, PSCUST wrote: Show quoted text
> The xxerror.t tests this feature.
Subject: xxerror.t
#!/usr/bin/perl -w use strict; use Test::More tests => 7; package TestParser; use base qw( Parser::MGC ); sub parse { my $self = shift; return $self->token_int; } package main; my $parser = TestParser->new; isa_ok( $parser, "TestParser", '$parser' ); isa_ok( $parser, "Parser::MGC", '$parser' ); my $value = $parser->from_string( "\t123" ); is( $value, 123, '->from_string' ); eval { $parser->from_string( "\t123." ) }; is( $@, "Expected end of input on line 1 at:\n". "\t123.\n". "\t ^\n", 'error with no file' ); eval { $parser->from_file( \*DATA ) }; is( $@, "Expected end of input on line 1 at:\n". " 123.\n". " ^\n", 'error with glob file' ); my $tmpfile = "test_error.tmp"; ok( open(my $fh, ">", $tmpfile), "create $tmpfile" ); print $fh " 123.\n"; close $fh; eval { $parser->from_file( $tmpfile ) }; is( $@, "Expected end of input on $tmpfile line 1 at:\n". " 123.\n". " ^\n", 'error with glob file' ); unlink $tmpfile; __DATA__ 123.
On Tue Mar 22 16:04:09 2011, PSCUST wrote: Show quoted text
> The fail() method calls where() to extract the current line number and > the whole line where the error occurred. This information is needed
for Show quoted text
> showing an error message to the user, but most times the error raised
by Show quoted text
> fail() is captured and used for backtracking. > > The fact that where() is called unnecessarily causes a performance
hit. Show quoted text
> There is a huge performance increase by postponing where() to when an > error message needs to be generated. > > On a test program parsing a real program file of about 128K bytes, I
got Show quoted text
> the following elapsed times: > > 13.97 s in the original 0.07 version of the module > 1.77 s with the attached modification
Ah yes, an excellent idea. Show quoted text
> The patch also adds the file name to the error message, in case > from_file() was called. This seams to be the intent of the 'filename' > attribute, but it is not used. The xxerror.t tests this feature.
Also noted. Show quoted text
> Please feel free to include the patch in a next version of the module, > or to do something completely different.
Thanks for those. I've found a slightly neater way to do the first bit of logic, but should be mostly the same in practice. Have also included your second change. Find attached the actual patch now committed, which will be in 0.08. -- Paul Evans
Subject: rt66786.patch
=== modified file 'Build.PL' --- Build.PL 2010-12-12 15:34:23 +0000 +++ Build.PL 2011-03-23 10:58:40 +0000 @@ -9,6 +9,7 @@ 'File::Slurp' => 0, }, build_requires => { + 'File::Temp' => 0, 'Test::More' => 0, }, license => 'perl', === modified file 'lib/Parser/MGC.pm' --- lib/Parser/MGC.pm 2011-03-23 10:45:43 +0000 +++ lib/Parser/MGC.pm 2011-03-23 10:50:45 +0000 @@ -287,7 +287,7 @@ my $self = shift; my ( $message ) = @_; - die Parser::MGC::Failure->new( $message, $self, pos($self->{str}) ); + die Parser::MGC::Failure->new( $message, $self, pos($self->{str}), $self->{filename} ); } =head2 $eos = $parser->at_eos @@ -897,7 +897,7 @@ { my $class = shift; my $self = bless {}, $class; - @{$self}{qw( message parser pos )} = @_; + @{$self}{qw( message parser pos filename )} = @_; return $self; } @@ -915,7 +915,11 @@ my $indent = substr( $text, 0, $col ); $indent =~ s/[^ \t]/ /g; # blank out all the non-whitespace - return "$self->{message} on line $linenum at:\n" . + my $in_file = ( defined $self->{filename} and !ref $self->{filename} ) + ? "in $self->{filename} " + : ""; + + return "$self->{message} ${in_file}on line $linenum at:\n" . "$text\n" . "$indent^\n"; } === added file 't/32exception.t' --- t/32exception.t 1970-01-01 00:00:00 +0000 +++ t/32exception.t 2011-03-23 10:58:19 +0000 @@ -0,0 +1,57 @@ +#!/usr/bin/perl -w + +use strict; + +use Test::More tests => 9; +use File::Temp qw( tempfile ); + +package TestParser; +use base qw( Parser::MGC ); + +sub parse +{ + my $self = shift; + + return $self->token_int; +} + +package main; + +my $parser = TestParser->new; + +isa_ok( $parser, "TestParser", '$parser' ); +isa_ok( $parser, "Parser::MGC", '$parser' ); + +my $value = $parser->from_string( "\t123" ); + +is( $value, 123, '->from_string' ); + +ok( !eval { $parser->from_string( "\t123." ) }, 'Trailing input on string fails' ); +is( $@, + qq[Expected end of input on line 1 at:\n]. + qq[\t123.\n]. + qq[\t ^\n], + 'Exception from trailing input on string' ); + +ok( !eval { $parser->from_file( \*DATA ) }, 'Trailing input on glob filehandle fails' ); +is( $@, + qq[Expected end of input on line 1 at:\n]. + qq[ 123.\n]. + qq[ ^\n], + 'Exception from trailing input on glob filehandle' ); + +my ( $fh, $filename ) = tempfile( "tmpfile.XXXXXX", UNLINK => 1 ); +END { defined $filename and unlink $filename } + +print $fh " 123.\n"; +close $fh; + +ok( !eval { $parser->from_file( $filename ) }, 'Trailing input on named file fails' ); +is( $@, + qq[Expected end of input in $filename on line 1 at:\n]. + qq[ 123.\n]. + qq[ ^\n], + 'Exception from trailing input on named file' ); + +__DATA__ + 123.
On Wed Mar 23 07:02:42 2011, PEVANS wrote: Show quoted text
> Find attached the actual patch now committed, which will be in 0.08.
Er.. lets try that again with all the changes this time ;) -- Paul Evans
Subject: rt66786.patch
=== modified file 'Build.PL' --- Build.PL 2010-12-12 15:34:23 +0000 +++ Build.PL 2011-03-23 10:58:40 +0000 @@ -9,6 +9,7 @@ 'File::Slurp' => 0, }, build_requires => { + 'File::Temp' => 0, 'Test::More' => 0, }, license => 'perl', === modified file 'lib/Parser/MGC.pm' --- lib/Parser/MGC.pm 2011-03-20 20:17:25 +0000 +++ lib/Parser/MGC.pm 2011-03-23 10:50:45 +0000 @@ -253,8 +253,10 @@ sub where { my $self = shift; - - my $pos = pos $self->{str}; + my ( $pos ) = @_; + + defined $pos or $pos = pos $self->{str}; + my $str = $self->{str}; my $sol = $pos; @@ -285,7 +287,7 @@ my $self = shift; my ( $message ) = @_; - die Parser::MGC::Failure->new( $message, $self->where ); + die Parser::MGC::Failure->new( $message, $self, pos($self->{str}), $self->{filename} ); } =head2 $eos = $parser->at_eos @@ -895,7 +897,7 @@ { my $class = shift; my $self = bless {}, $class; - @{$self}{qw( message linenum col text )} = @_; + @{$self}{qw( message parser pos filename )} = @_; return $self; } @@ -904,15 +906,21 @@ { my $self = shift; + my ( $linenum, $col, $text ) = $self->{parser}->where( $self->{pos} ); + # Column number only counts characters. There may be tabs in there. # Rather than trying to calculate the visual column number, just print the # indentation as it stands. - my $indent = substr( $self->{text}, 0, $self->{col} ); + my $indent = substr( $text, 0, $col ); $indent =~ s/[^ \t]/ /g; # blank out all the non-whitespace - return "$self->{message} on line $self->{linenum} at:\n" . - "$self->{text}\n" . + my $in_file = ( defined $self->{filename} and !ref $self->{filename} ) + ? "in $self->{filename} " + : ""; + + return "$self->{message} ${in_file}on line $linenum at:\n" . + "$text\n" . "$indent^\n"; } === added file 't/32exception.t' --- t/32exception.t 1970-01-01 00:00:00 +0000 +++ t/32exception.t 2011-03-23 10:58:19 +0000 @@ -0,0 +1,57 @@ +#!/usr/bin/perl -w + +use strict; + +use Test::More tests => 9; +use File::Temp qw( tempfile ); + +package TestParser; +use base qw( Parser::MGC ); + +sub parse +{ + my $self = shift; + + return $self->token_int; +} + +package main; + +my $parser = TestParser->new; + +isa_ok( $parser, "TestParser", '$parser' ); +isa_ok( $parser, "Parser::MGC", '$parser' ); + +my $value = $parser->from_string( "\t123" ); + +is( $value, 123, '->from_string' ); + +ok( !eval { $parser->from_string( "\t123." ) }, 'Trailing input on string fails' ); +is( $@, + qq[Expected end of input on line 1 at:\n]. + qq[\t123.\n]. + qq[\t ^\n], + 'Exception from trailing input on string' ); + +ok( !eval { $parser->from_file( \*DATA ) }, 'Trailing input on glob filehandle fails' ); +is( $@, + qq[Expected end of input on line 1 at:\n]. + qq[ 123.\n]. + qq[ ^\n], + 'Exception from trailing input on glob filehandle' ); + +my ( $fh, $filename ) = tempfile( "tmpfile.XXXXXX", UNLINK => 1 ); +END { defined $filename and unlink $filename } + +print $fh " 123.\n"; +close $fh; + +ok( !eval { $parser->from_file( $filename ) }, 'Trailing input on named file fails' ); +is( $@, + qq[Expected end of input in $filename on line 1 at:\n]. + qq[ 123.\n]. + qq[ ^\n], + 'Exception from trailing input on named file' ); + +__DATA__ + 123.
Subject: Re: [rt.cpan.org #66786] Performance: where() called by fail() when backtracking
Date: Wed, 23 Mar 2011 20:40:11 +0000
To: bug-Parser-MGC [...] rt.cpan.org
From: Paulo Custodio <pauloscustodio [...] gmail.com>
Thanks, it worked. It is even quicker than my patch, the same test now takes 1.32 seconds. I think the reason is that I was passing the big 'str' as argument to Parser::MGC::Failure->new(). Passing just the reference to the Parser object is quicker. BTW, why did you pass also 'filename' to the Failure constructor? It is also available at the parser object. Best regards, Paulo Custodio
Subject: Re: [rt.cpan.org #66786] Performance: where() called by fail() when backtracking
Date: Wed, 23 Mar 2011 22:14:22 +0000
To: bug-Parser-MGC [...] rt.cpan.org
From: Paulo Custodio <pauloscustodio [...] gmail.com>
Another note: the time spent in wkip_ws can be reduced to about half by computing a 'skip' pattern at the constructor that combines the 'ws' and 'comment' patterns, and replace the '1 while m//' by just a match on 'skip'. In a test script it reduced the time spent in the skip_ws method (exclusive time) from 4.57s to 2.59s, as reported by Devel::NYTProf. Best regards, Paulo Custodio

Message body is not shown because sender requested not to inline it.

On Wed Mar 23 18:14:51 2011, pauloscustodio@gmail.com wrote: Show quoted text
> Another note: the time spent in wkip_ws can be reduced to about half > by computing a 'skip' pattern at the constructor that combines the > 'ws' and 'comment' patterns, and replace the '1 while m//' by just a > match on 'skip'.
I've moved this comment into a new bug https://rt.cpan.org/Ticket/Display.html?id=74779 I can now close this one. -- Paul Evans