Skip Menu |

This queue is for tickets about the CAM-PDF CPAN distribution.

Report information
The Basics
Id: 39214
Status: resolved
Worked: 15 hours (900 min)
Priority: 0/
Queue: CAM-PDF

People
Owner: Nobody in particular
Requestors: linuxgeek [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Problem parsing PDF files
Date: Fri, 12 Sep 2008 04:42:47 -0700 (PDT)
To: bug-CAM-PDF [...] rt.cpan.org
From: Eric Harlow <linuxgeek [...] yahoo.com>
I'm trying to get the text out of a PDF file. When I try, I get the following error. Could not find PDF cross-ref table at location 1152553/-1/0 0 (empty) I am using Perl 5.8.8 on Linux Fedora 9. This same file had an issue with Xpdf/pdftotext. The developer of that product said "The issue is that there are references to object #1046, with generation 1 -- but object #1046 is a compressed object, and compressed objects are always generation 0, by definition. (Short version: there's a bug in the PDF generation software.) It looks like Acrobat ignores the generation number entirely for compressed objects, so I'll change Xpdf to do the same." I can provide a copy of the PDF file if you want to take a look.
Subject: Re: [rt.cpan.org #39214] Problem parsing PDF files
Date: Fri, 12 Sep 2008 20:32:28 -0500
To: bug-CAM-PDF [...] rt.cpan.org
From: Chris Dolan <chris [...] chrisdolan.net>
Yeah, that's a known issue with CAM::PDF: compressed objects are a PDF 1.5 feature, and I've still got several PDF 1.4 features to do... I started writing support for compressed objects over the summer, but it's still very buggy, so I didn't release it. With any luck, the next CAM::PDF release will help, but I can't predict when that will be. If you are financially motivated to get a solution quicker than "someday", I would accept a contribution to rearrange priorities. But otherwise, I would recommend looking for setting in your PDF generation software that says something like "save as PDF 1.4- compatible". When I finish support for compressed objects, I'll be sure to handle the generation number correctly -- I appreciate that part of your feedback, because I probably would have stumbled on that same detail. Chris On Sep 12, 2008, at 6:43 AM, Eric Harlow via RT wrote: Show quoted text
> Fri Sep 12 07:43:15 2008: Request 39214 was acted upon. > Transaction: Ticket created by linuxgeek@yahoo.com > Queue: CAM-PDF > Subject: Problem parsing PDF files > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: linuxgeek@yahoo.com > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=39214 > > > > > I'm trying to get the text out of a PDF file. When I try, I get > the following error. > > Could not find PDF cross-ref table at location 1152553/-1/0 > 0 (empty) > > I am using Perl 5.8.8 on Linux Fedora 9. > > This same file had an issue with Xpdf/pdftotext. The developer of > that product said > > "The issue is that there are references to object #1046, with > generation > 1 -- but object #1046 is a compressed object, and compressed > objects are > always generation 0, by definition. (Short version: there's a bug in > the PDF generation software.) It looks like Acrobat ignores the > generation number entirely for compressed objects, so I'll change Xpdf > to do the same." > > I can provide a copy of the PDF file if you want to take a look. > > > >
This is solved as of CAM::PDF 1.51. Please, try this release with any PDF v1.5 documents you have on hand and file a new bug if you encounter parse problems.