Bug #86077 for CAM-PDF: CAM::PDF Error Extracting Text From a PDF Page

Subject:	CAM::PDF Error Extracting Text From a PDF Page
Date:	Tue, 11 Jun 2013 23:41:21 -0400
To:	bug-CAM-PDF [...] rt.cpan.org
From:	Hal Weitzman <haroldweitzman [...] gmail.com>

Hi Chris, I am using CAM::PDF to loop page by page through a PDF document looking to exclude the PDF based on existence of one of a set of specific phrases. Many PDFs give no trouble. However, recently I encountered several PDFs where the following error occurs and halts my script: "DecodeParms must be a dictionary" I used the eval function to trap the error and allow the script to continue. Here is the PDF scan section of my code: my $pdf = CAM::PDF->new($path_to_temp . $title); # Init an object my $pages = $pdf->numPages; # get pages to search $ii = 1; # init the index to PDF pages while ($ii < $pages) { # loop while more pages print "\n Get page $ii text "; # for testing eval { # catch the error $PageText = $pdf->getPageText($ii); # get the current page text }; # end of eval block if ($@) {# check for error print "\n Get page $ii text failed -> $@ "; # inform the log next; # skip to the next page } # end of error check The rest of the code searches the current page text for an exclude phrase and performs the required action. Here is the log output: Row 1 SPE4A713Q5923.PDF Get page 1 text Get page 1 text failed -> DecodeParms must be a dictionary. Get page 2 text Get page 3 text Get page 4 text Get page 5 text Get page 6 text Get page 7 text Get page 8 text Get page 9 text Get page 10 text Get page 11 text Get page 12 text Get page 13 text Get page 14 text Get page 15 text Get page 16 text found INSPECTION POINT: ORIGIN --> Skipped All these PDFs fail on page 1. I have attached the PDF that generated this log. Ihope it is not too large. I am using Padre 0.98, Perl 5.14.2 and CAM::PDF 1.59. My OS is Win7 Ultimate. The PDF is downloaded from the web using WWW::Mechanize::Firefox (version 0.74) to a temp directory then loaded into a new CAM::PDF object. (It would be nice to beable to download the PDF directly into CAM::PDF) I don't know if this is a PDF version issue (is there a way to get the version of a PDF?) , a bug or, maybe, a preferences setting. Thank you for making this module available and for your continued support. -- Regards Hal Weitzman haroldweitzman@gmail.com Cell: 609-217-0088

Message body is not shown because it is too large.

Download SPE4A713Q5923.PDF
application/pdf 321.8k

Message body not shown because it is not plain text.