Skip Menu |

This queue is for tickets about the Pod-Parser CPAN distribution.

Report information
The Basics
Id: 71139
Status: resolved
Priority: 0/
Queue: Pod-Parser

People
Owner: Marek.Rouchal [...] gmx.net
Requestors: nick [...] ccl4.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: 3 possible optimisations
In core, t/porting/podcheck.t is slow. Slow as in roughly a minute runtime, which is frustrating. But to be fair to it, it's doing a *lot* of work. I ran it under Devel::NYTProf. [Runtime there about 150s] There's nothing obvious. However, these 3 changes seem to help. [diff is from the path in the core - I hope that patch -p3 will apply it cleanly. All tests pass in the core] diff --git a/cpan/Pod-Parser/lib/Pod/Checker.pm b/cpan/Pod-Parser/lib/Pod/Checker.pm index a230542..f901802 100644 --- a/cpan/Pod-Parser/lib/Pod/Checker.pm +++ b/cpan/Pod-Parser/lib/Pod/Checker.pm @@ -1099,7 +1099,7 @@ sub _check_ptree { $text .= $self->_check_ptree($contents, $line, $file, "$nestlist$cmd"); next; } - if($nestlist =~ /$cmd/) { + if(index($nestlist, $cmd) != -1) { $self->poderror({ -line => $line, -file => $file, -severity => 'WARNING', -msg => "nested commands $cmd<...$cmd<...>...>"}); Superficially I find that more ugly, but it actually is clearer what the code is doing, as [if I've understood it correctly] $cmd is always 1 letter. NYProf reports this line goes from 0.725s to 0.056s diff --git a/cpan/Pod-Parser/lib/Pod/Parser.pm b/cpan/Pod-Parser/lib/Pod/Parser.pm index c807f3f..7d54864 100644 --- a/cpan/Pod-Parser/lib/Pod/Parser.pm +++ b/cpan/Pod-Parser/lib/Pod/Parser.pm @@ -840,7 +840,7 @@ sub parse_text { $seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_); } ## Keep track of line count - $line += s/\r*\n//; + $line += /\n/; ## Remember the "current" sequence $seq = $seq_stack[-1]; } ˘ $_ isn't used subsequently, so the substitution isn't necessary. That makes the need to match \r* redundant. 1.99s to 1.61s Does this code date from a time when the parser was different, chewing through the text, and the substitution was needed? @@ -1096,7 +1096,7 @@ sub parse_from_filehandle { ## See if this line is blank and ends the current paragraph. ## If it isnt, then keep iterating until it is. - next unless (($textline =~ /^([^\S\r\n]*)[\r\n]*$/) + next unless (($textline =~ /^[^\S\r\n]*[\r\n]*$/) && (length $paragraph)); ## Now process the paragraph The capture isn't needed. 4.73s to 4.43s I'm wondering if there are other unneeded captures, but didn't look, as there weren't any others in obviously hot places. The structure of parse_text is bugging me, as this code repeats work: time calls time/call code 0.153206 87640 0.000002 $_ = $text; 0.765981 87640 0.000009 my @tokens = split /([A-Z]<(?:<+\s)?)/; 0.111030 87640 0.000001 while ( @tokens ) { 0.407500 255096 0.000002 $_ = shift @tokens; 0.000000 0 0.000000 ## Look for the beginning of a sequence 1.999841 255096 0.000008 if ( /^([A-Z])(<(?:<+\s)?)$/ ) { and it's not cheap work. That last line alone is 2 seconds of repeated work, and over 1% of the *total* runtime. :-( I can't see a way to avoid that, without a substantial refactoring. Nicholas Clark
Subject: Pod-Parser.patch
diff --git a/cpan/Pod-Parser/lib/Pod/Checker.pm b/cpan/Pod-Parser/lib/Pod/Checker.pm index a230542..f901802 100644 --- a/cpan/Pod-Parser/lib/Pod/Checker.pm +++ b/cpan/Pod-Parser/lib/Pod/Checker.pm @@ -1099,7 +1099,7 @@ sub _check_ptree { $text .= $self->_check_ptree($contents, $line, $file, "$nestlist$cmd"); next; } - if($nestlist =~ /$cmd/) { + if(index($nestlist, $cmd) != -1) { $self->poderror({ -line => $line, -file => $file, -severity => 'WARNING', -msg => "nested commands $cmd<...$cmd<...>...>"}); diff --git a/cpan/Pod-Parser/lib/Pod/Parser.pm b/cpan/Pod-Parser/lib/Pod/Parser.pm index c807f3f..7d54864 100644 --- a/cpan/Pod-Parser/lib/Pod/Parser.pm +++ b/cpan/Pod-Parser/lib/Pod/Parser.pm @@ -840,7 +840,7 @@ sub parse_text { $seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_); } ## Keep track of line count - $line += s/\r*\n//; + $line += /\n/; ## Remember the "current" sequence $seq = $seq_stack[-1]; } @@ -1096,7 +1096,7 @@ sub parse_from_filehandle { ## See if this line is blank and ends the current paragraph. ## If it isnt, then keep iterating until it is. - next unless (($textline =~ /^([^\S\r\n]*)[\r\n]*$/) + next unless (($textline =~ /^[^\S\r\n]*[\r\n]*$/) && (length $paragraph)); ## Now process the paragraph
Applied...thanks for the sleuthing. Will be released with Pod-Parser-1.50 soon. Regarding the last comment: I had to change the regexp like this: ## Look for the beginning of a sequence if ( /^([A-Z])(<(?:<+(?:\r?\n|[ \t]))?)$/ ) { Would it be faster if written like this: ## Look for the beginning of a sequence if ( /^([A-Z])(<(?:<+(?:\r?\n|[ \t])|))$/ ) { And no, I do not have more optimization ideas... and the future is Pod::Simple, anyway :-) -Marek
Subject: Re: [rt.cpan.org #71139] 3 possible optimisationss
Date: Thu, 2 Feb 2012 15:17:41 +0000
To: Marek Rouchal via RT <bug-Pod-Parser [...] rt.cpan.org>
From: Nicholas Clark <nick [...] ccl4.org>
On Sat, Jan 21, 2012 at 02:48:12PM -0500, Marek Rouchal via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=71139 > > > Applied...thanks for the sleuthing. Will be released with > Pod-Parser-1.50 soon.
Thanks. Show quoted text
> Regarding the last comment: I had to change the regexp like this: > > ## Look for the beginning of a sequence > if ( /^([A-Z])(<(?:<+(?:\r?\n|[ \t]))?)$/ ) { > > Would it be faster if written like this: > > ## Look for the beginning of a sequence > if ( /^([A-Z])(<(?:<+(?:\r?\n|[ \t])|))$/ ) {
I tried benchmarking this change, and it doesn't seem to help enough to be properly measurable. Superficially the change is 0.03s faster: 247154 1.08s 247154 264ms if ( /^([A-Z])(<(?:<+(?:\r?\n|[ \t])|))$/ ) { # spent 264ms making 247154 calls to Pod::Parser::CORE:match, avg 1µs/call 247154 1.05s 247154 269ms if ( /^([A-Z])(<(?:<+(?:\r?\n|[ \t]))?)$/ ) { # spent 269ms making 247154 calls to Pod::Parser::CORE:match, avg 1µs/call but that's 1 run vs 1 run, and the noise is more than that. The other changes I found were at least 0.3s, ie 10 times better. I don't think that this change is worth making - I'm not even confident it is actually a speedup, and my opinion is that the original code is a bit easier to read and understand. ('?' for "zero or one of these" being clearer to my mind that '|' for "alternatively match nothing") Show quoted text
> And no, I do not have more optimization ideas... and the future is > Pod::Simple, anyway :-)
Yes, and I'm not sure what I can do to help the future arrive sooner. Nicholas Clark
Implemented as suggested in Pod-Parser-1.51.