Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 113290
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: melmothx [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.031



Subject: Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date: Wed, 23 Mar 2016 14:31:55 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
See the attached patch:
diff --git a/lib/PDF/API2/Basic/PDF/File.pm b/lib/PDF/API2/Basic/PDF/File.pm index a9bba4b..e0b6cc8 100644 --- a/lib/PDF/API2/Basic/PDF/File.pm +++ b/lib/PDF/API2/Basic/PDF/File.pm @@ -711,7 +711,7 @@ sub read_objnum { my $src = $self->read_objnum($object_location->[0], 0, %opts); die 'Cannot find the compressed object stream' unless $src; - $src->read_stream if $src->{' nofilt'}; + $src->read_stream(1) if $src->{' nofilt'}; my $map = substr($src->{' stream'}, 0, $src->{'First'}->val); my $objects = substr($src->{' stream'}, $src->{'First'}->val);
From the doc, it looks like read_stream without a true argument empties the ' stream' content in some cases, storing it on the disk. But here the code unconditionally assumes that ' stream' is always set to a string. WARNING: I'm not sure the patch does the right thing, though, but appears to work. Attached a sample PDF which triggers the bug. I couldn't strip down the file to have a more reasonable size, sorry. perl -Ilib -MPDF::API2 -e 'PDF::API2->open("large-compressed.pdf");' Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Download large-compressed.pdf
application/pdf 385.5k

Message body not shown because it is not plain text.

Cheers -- Marco
I've applied the patch. Better would be to change read_objnum to read from the file if $src->{' streamfile'} is set. Do you want to try implementing that? If so, I'm not entirely sure that the way Dict.pm creates the streamfile is entirely correct -- if you run into problems with it, let me know, preferably with a large file I can use to troubleshoot.
On Sat Mar 26 09:16:57 2016, SSIMMS wrote: Show quoted text
> I've applied the patch. Better would be to change read_objnum to read > from the file if $src->{' streamfile'} is set. Do you want to try > implementing that? > > If so, I'm not entirely sure that the way Dict.pm creates the > streamfile is entirely correct -- if you run into problems with it, > let me know, preferably with a large file I can use to troubleshoot.
... assuming that it's a different file than the one that's already attached to this ticket.
Subject: Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date: Sat, 26 Mar 2016 15:01:04 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
"Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=113290 > > > On Sat Mar 26 09:16:57 2016, SSIMMS wrote:
>> I've applied the patch. Better would be to change read_objnum to read >> from the file if $src->{' streamfile'} is set. Do you want to try >> implementing that? >> >> If so, I'm not entirely sure that the way Dict.pm creates the >> streamfile is entirely correct -- if you run into problems with it, >> let me know, preferably with a large file I can use to troubleshoot.
> > ... assuming that it's a different file than the one that's already attached to this ticket.
Hi Steve! Sure, I can give it a try over the next days. I'll let you know. -- Marco
Subject: Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date: Mon, 28 Mar 2016 17:38:31 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
"Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=113290 > > > I've applied the patch. Better would be to change read_objnum to read > from the file if $src->{' streamfile'} is set. Do you want to try > implementing that? > > If so, I'm not entirely sure that the way Dict.pm creates the > streamfile is entirely correct -- if you run into problems with it, > let me know, preferably with a large file I can use to troubleshoot.
Please see: https://github.com/ssimms/pdfapi2/pull/4 I'm not sure it's ready to be merged, as testing it effectively is a bit complicated, so I'd appreciate it if you would take a look at it. -- Marco
Subject: Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date: Sat, 30 Apr 2016 15:34:29 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
"Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=113290 > > > On Sat Mar 26 09:16:57 2016, SSIMMS wrote:
>> I've applied the patch. Better would be to change read_objnum to read >> from the file if $src->{' streamfile'} is set. Do you want to try >> implementing that?
Hello Steve and sorry to bother you. Beside the patch sitting in the PR (which I understand could require same mumbling), could we have a release with the fix in this ticket, i.e. at this commit https://github.com/ssimms/pdfapi2/commit/e576b85de116e6ef2e475adb6d5ad68261a84b83 It's too bad the CPAN is delivering a version which breaks and the fix is already in. Maybe not optimal, but at least working. Please let me know if I can be of any help. Best wishes -- Marco
Subject: Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date: Thu, 30 Jun 2016 16:38:54 +0000
To: "bug-PDF-API2 [...] rt.cpan.org" <bug-PDF-API2 [...] rt.cpan.org>
From: Branko Krznaric <bkrzno [...] hotmail.com>
Thank you for fixing this bug. I have installed the latest release PDF-API2-2.028. I can now open some PDF files I was not able before, but I get similar error when opening some encrypted PDFs, e.g. "Objind 807 does not exist at index 0 at lib/PDF/API2/Basic/PDF/File.pm line 722." This has happened with several PDFs. They all have in common that they are encrypted. I have attached a sample PDF which triggers the bug. Tech details: perl v5.20.1 built for MSWin32-x86-multi-thread-64int OS Win 7, SP 1, 32bit Thank you in advance for looking into this! I highly appreciate your work. Branko
Download 46006606.pdf
application/pdf 1.6m

Message body not shown because it is not plain text.

On Thu Jun 30 12:39:32 2016, bkrzno@hotmail.com wrote: Show quoted text
> Thank you for fixing this bug. I have installed the latest release > PDF-API2-2.028. I can now open some PDF files I was not able before, > but I get similar error when opening some encrypted PDFs, e.g. "Objind > 807 does not exist at index 0 at lib/PDF/API2/Basic/PDF/File.pm line > 722." > > This has happened with several PDFs. They all have in common that they > are encrypted. I have attached a sample PDF which triggers the bug.
I just took a look at this, and it appears to be a separate issue -- PDF::API2 doesn't know how to read encrypted PDFs. That would be a nice wishlist item, and doesn't appear to be especially difficult to implement. Feel free to create a new ticket for it, particularly if you'd like to try to write the code (I can provide pointers if so).
I've just rewritten some of the code in Dict.pm and File.pm: Dict.pm: Previously, read_stream was only creating a stream cache file when a given 4kB block uncompressed to over 16kB, whereas it was supposed to do so whenever the uncompressed stream was more than 32kB. I've fixed that, and increased the max in-memory stream size from 32kB to 16MB when $force_memory isn't set. File.pm: read_objnum will now read from a stream cache file if one exists, rather than requiring that the entire object stream be read into memory first. I've also added comments and used more descriptive variable names in the hope of improving maintainability. This should resolve the issue without increasing memory consumption when there's a large object stream. To test it, you can set $mincache to a small number (e.g. 8192) to ensure that the cache files get created. Can you confirm that things are working properly for you using the latest code on GitHub?