Bug #120347 for PDF-Extract: Very slow on big files

RT for rt.cpan.org

This queue is for tickets about the PDF-Extract CPAN distribution.

Report information

The Basics

Id:	120347
Status:	new
Priority:	0/
Queue:	PDF-Extract

People

Owner:	Nobody in particular
Requestors:	'spro^^%^6ut# [...] &$%*c
Cc:
AdminCc:

Bug Information

Severity:	(no value)
Broken in:	(no value)
Fixed in:	(no value)

History Show all quoted text

Fri Feb 24 00:34:20 2017 $_ = 'spro^^*%*^6ut# [...] &$%*c>#!^!#&!pan.org'; y/a-z.@//cd; print - Ticket created

Subject:

Very slow on big files

PDF::Extract reads the entire file into memory and then uses regular expressions to find things. This results in it taking 60 seconds to extract a page from a 160MB PDF. If it is not feasible to fix this (I imagine it would entail rewriting most of the code), maybe this limitation could be documented.