Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 48683
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: cherdt [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.73
Fixed in: 2.026



Subject: PDF-API2 throws error "Malformed xref in PDF file" for newer PDFs (PDF v1.5+)
Although PDF-API2 works well for older PDF formats (1.4), newer PDF formats (1.5, 1.6) cause it to throw error an error message: "Malformed xref in PDF file at [path to File.pm] line 1198" I can reproduce the error with the following Perl script: use PDF::API2; $pdf = PDF::API2->open($ARGV[0]); The PDFs I am using were produced by Adobe Acrobat Pro 9 (the attached is an example). I saw the possibly related bug report submitted by abhinavk, but his fix did not work for me. Environment: Perl v5.10.0 (ActiveState) on WinXP Pro 2002 SP3.
Subject: HowToArgueEffectively.pdf
Download HowToArgueEffectively.pdf
application/pdf 14.9k

Message body not shown because it is not plain text.

Subject: [rt.cpan.org #48683]
Date: Fri, 14 Aug 2009 11:41:47 -0500
To: bug-PDF-API2 [...] rt.cpan.org
From: Chris Herdt <cherdt [...] gmail.com>
I have since successfully used PDF::API2 to modify version 1.6 PDFs, so I'm not certain why the particular PDF in question produced the error. Perhaps that file in particular has an unfriendly xref value, which is apparently modified or removed when saved as an earlier PDF version.
On Fri Aug 14 12:42:10 2009, cherdt wrote: Show quoted text
> I have since successfully used PDF::API2 to modify version 1.6 PDFs, > so I'm not certain why the particular PDF in question produced the > error. Perhaps that file in particular has an unfriendly xref value, > which is apparently modified or removed when saved as an earlier PDF > version.
Since PDF 1.5, the spec changed to allow xref information to be in streams instead of tables. This isn't supported by PDF::API2 (though I'll be very happy if someone beats me to fixing that and sends a patch!). Acrobat 9 started using cross-reference streams by default, so this error is more common with newer files. PDF::API2 will work fine if you generate a PDF in Acrobat 9 without using cross-reference streams, however. The easiest way to do this is to make it compatible with Acrobat 5.0 and later when you save.
From: pwomack [...] papermule.co.uk
Show quoted text
> > Acrobat 9 started using cross-reference streams by default, so this > error is more common with newer files. PDF::API2 will work fine if you > generate a PDF in Acrobat 9 without using cross-reference streams, > however. The easiest way to do this is to make it compatible with > Acrobat 5.0 and later when you save.
With the passage of time, these are becoming more common. I looked into adding the cross-reference stream myself, but it is too complex to be a "patch"; it'a a significant piece of implementation. So can I just "+1" the importance of this. BugBear
Version 2.020, just released, contains two updates relevant to this issue: 1) If PDF::API2 encounters a cross-reference stream, it will now give a more appropriate error message rather than saying that the cross-reference table is malformed. 2) The Known Issues section of the POD contains pointers to the PDF specification, which describes how both the old cross-reference table works and how the new cross-reference streams work.
From: don.huettl [...] grantstreet.com
I have attached three patches to implement read-only support for cross-reference streams and compressed objects. Saving the results will still write a v1.4 document. Patches should be applied in the following order: PDF-API2-2.023-XRefStm.patch PDF-API2-2.023-Predictor.patch PDF-API2-2.023-XRef-test.patch The unit test that I added needs the example document attached to this ticket to be saved as t/resources/HowToArgueEffectively.pdf. If this is applied, please credit my employer, Grant Street Group <gsg@cpan.org>, in addition to myself.
Subject: PDF-API2-2.023-XRefStm.patch

Message body is not shown because it is too large.

Subject: PDF-API2-2.023-XRef-test.patch
commit c198a9745c7a Author: Don Huettl <don.huettl@grantstreet.com> Date: Thu Mar 19 15:28:53 2015 -0400 tests to validate cross-reference stream logic Adds a reference PDF document containing XRef streams, and the associated unit tests. diff --git a/PDF-API2/t/resources/HowToArgueEffectively.pdf b/PDF-API2/t/resources/HowToArgueEffectively.pdf new file mode 100644 index 000000000000..8bfd9482b940 Binary files /dev/null and b/PDF-API2/t/resources/HowToArgueEffectively.pdf differ diff --git a/PDF-API2/t/xref.t b/PDF-API2/t/xref.t new file mode 100644 index 000000000000..0280251f76b3 --- /dev/null +++ b/PDF-API2/t/xref.t @@ -0,0 +1,27 @@ +use Test::More tests => 2; + +use warnings; +use strict; + +use PDF::API2; + +my $pdf = eval { + PDF::API2->open('t/resources/HowToArgueEffectively.pdf'); +}; + +isa_ok($pdf, 'PDF::API2', q{doc containing an XRef stream}); + +my $file = $pdf->{pdf}; +my $pass = 1; + +while (my($id, $xref) = each %{$file->{' xref'}}) { + my $obj = $file->read_objnum($id, $xref->[1]); + + unless (ref($obj)) { + $pass = 0; + last; + } +} + +ok($pass, 'all XRef entries point to an object'); +
Subject: PDF-API2-2.023-Predictor.patch

Message body is not shown because it is too large.

From: don.huettl [...] grantstreet.com
I have one more patch that does a little clean-up, attached.
Subject: PDF-API2-Predictor-pt2.patch
diff --git PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm index 7d2c388dcfc0..813951d0f6fc 100644 --- PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm +++ PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm @@ -22,8 +22,7 @@ sub new { sub outfilt { my ($self) = @_; - warn 'The "outfilt" method is not implemented'; - return; + die 'The "outfilt" method is not implemented'; } sub infilt { @@ -44,7 +43,7 @@ sub infilt { } elsif ($predictor >= 10 && $predictor <= 15) { $self->_depredict_png; } else { - warn "Invalid predictor: $predictor"; + die "Invalid predictor: $predictor"; } return $obj->{' stream'}; @@ -133,7 +132,7 @@ sub _depredict_png { sub _depredict_tiff { my ($self) = @_; - warn "The TIFF predictor logic has not been implemented"; + die "The TIFF predictor logic has not been implemented"; } 1; diff --git PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm index bdf3356a9f8d..3fd5832cb675 100644 --- PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm +++ PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm @@ -8,6 +8,7 @@ use POSIX qw(ceil); use IO::File; use PDF::API2::Util; +use PDF::API2::Basic::PDF::Filter::Predictor; use PDF::API2::Basic::PDF::Utils; no warnings qw[ deprecated recursion uninitialized ]; @@ -31,7 +32,10 @@ sub new { open($fh,$file); binmode($fh); seek($fh,8,0); + $self->{Length}=PDFNum(-s $file); $self->{' stream'}=''; + $self->{' streamloc'}=0; + $self->{' streamsrc'}=$fh; $self->{' nofilt'}=1; while(!eof($fh)) { read($fh,$buf,4);
I've tried out the patch as found in the xref-streams branch on https://github.com/ssimms/pdfapi2 but I get this failure with a PDF compiled by XeLaTeX: Objind 1 does not exist at index 46 at lib/PDF/API2/Basic/PDF/File.pm line 710. Pull request for the xref-streams branch issued at https://github.com/ssimms/pdfapi2/pull/3
On Thu Dec 24 04:08:40 2015, MELMOTHX wrote: Show quoted text
> I've tried out the patch as found in the xref-streams branch on > https://github.com/ssimms/pdfapi2 but I get this failure with a PDF > compiled by XeLaTeX: > > Objind 1 does not exist at index 46 at lib/PDF/API2/Basic/PDF/File.pm > line 710. > > Pull request for the xref-streams branch issued at > https://github.com/ssimms/pdfapi2/pull/3
As stated in the third commit, trying it out against PDF::Cropmarks leads to some memory-hungry (possibly endless) recursion. I'm way out of my deeps here, though. Anyway, I hope this helps.
I've merged the xref-streams branch after making a few fixes, including compatibility for older versions of Perl and a fix for the issue that MELMOTHX discovered with object streams. Many thanks! This will be included in the upcoming 2.026 release.