Bug #12327 for Pod-Parser: parse_from_filehandle is very slow (N^2) for large files

Mon Apr 18 11:52:51 2005 itub [...] cpan.org - Ticket created

Subject:

parse_from_filehandle is very slow (N^2) for large files

I noticed that Pod::Parser takes a long time (~10 min) to parse files such as http://search.cpan.org/~areibens/PDF-API2-0.41/lib/PDF/API2/Resource/CIDFont/CMap/japanese.pm . This is a ~4 MB file with only a minimal amount of POD at the end. The problem is that most of the file is just one paragraph; if we look at the code for parse_from_filehandle, around line 1069: ## Read paragraphs line-by-line while (defined ($textline = $tied_fh ? <$in_fh> : $in_fh->getline)) { $textline = $self->preprocess_line($textline, ++$nlines); next unless ((defined $textline) && (length $textline)); $_ = $paragraph; ## save previous contents The last line shown copies the paragraph for every line, which results in N^2 behavior, since, for N lines, the first line is copied N times, the second N-1, etc. Since I couldn't find any use for the paragraph saved in $_, I just deleted that line, which reduced the runtime for japanese.pm to about 1 s. I haven't found any problem with converting various pods to HTML with my modified version, so I'm wondering if that line is really necessary. There is a small possibility that $_ could be used in another subroutine, but I hope that this kind of "action at a distance" is not the case here. Note: from looking at previous versions on CPAN, I see that the line in question was introduced with version 1.06.

Tue May 31 15:20:02 2005 Marek.Rouchal [...] gmx.net - Taken

Tue May 31 15:21:38 2005 Marek.Rouchal [...] gmx.net - Status changed from 'new' to 'resolved'

Thu Jan 15 02:49:27 2009 Marek.Rouchal [...] gmx.net - Fixed in 1.31 added