Subject: | parse_from_filehandle is very slow (N^2) for large files |
I noticed that Pod::Parser takes a long time (~10 min) to parse files such as http://search.cpan.org/~areibens/PDF-API2-0.41/lib/PDF/API2/Resource/CIDFont/CMap/japanese.pm . This is a ~4 MB file with only a minimal amount of POD at the end. The problem is that most of the file is just one paragraph; if we look at the code for parse_from_filehandle, around line 1069:
## Read paragraphs line-by-line
while (defined ($textline = $tied_fh ? <$in_fh> : $in_fh->getline)) {
$textline = $self->preprocess_line($textline, ++$nlines);
next unless ((defined $textline) && (length $textline));
$_ = $paragraph; ## save previous contents
The last line shown copies the paragraph for every line, which results in N^2 behavior, since, for N lines, the first line is copied N times, the second N-1, etc. Since I couldn't find any use for the paragraph saved in $_, I just deleted that line, which reduced the runtime for japanese.pm to about 1 s. I haven't found any problem with converting various pods to HTML with my modified version, so I'm wondering if that line is really necessary. There is a small possibility that $_ could be used in another subroutine, but I hope that this kind of "action at a distance" is not the case here.
Note: from looking at previous versions on CPAN, I see that the line in question was introduced with version 1.06.