Skip Menu |

This queue is for tickets about the Pod-Parser CPAN distribution.

Report information
The Basics
Id: 12327
Status: resolved
Worked: 30 min
Priority: 0/
Queue: Pod-Parser

People
Owner: Marek.Rouchal [...] gmx.net
Requestors: itub [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: 1.31



Subject: parse_from_filehandle is very slow (N^2) for large files
I noticed that Pod::Parser takes a long time (~10 min) to parse files such as http://search.cpan.org/~areibens/PDF-API2-0.41/lib/PDF/API2/Resource/CIDFont/CMap/japanese.pm . This is a ~4 MB file with only a minimal amount of POD at the end. The problem is that most of the file is just one paragraph; if we look at the code for parse_from_filehandle, around line 1069: ## Read paragraphs line-by-line while (defined ($textline = $tied_fh ? <$in_fh> : $in_fh->getline)) { $textline = $self->preprocess_line($textline, ++$nlines); next unless ((defined $textline) && (length $textline)); $_ = $paragraph; ## save previous contents The last line shown copies the paragraph for every line, which results in N^2 behavior, since, for N lines, the first line is copied N times, the second N-1, etc. Since I couldn't find any use for the paragraph saved in $_, I just deleted that line, which reduced the runtime for japanese.pm to about 1 s. I haven't found any problem with converting various pods to HTML with my modified version, so I'm wondering if that line is really necessary. There is a small possibility that $_ could be used in another subroutine, but I hope that this kind of "action at a distance" is not the case here. Note: from looking at previous versions on CPAN, I see that the line in question was introduced with version 1.06.