Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 120397
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: 'spro^^*%*^6ut# [...] &$%*c
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.032



Subject: Can’t handle newlines in references
In PDF syntax, an indirect reference consists of three distinct tokens that can be separated by any PDF whitespace, and even comments. For example, this is a syntactically valid indirect reference: 1 %eieio 0 R PDF::API2 does not allow comments at all (based on reading the code; that is not a problem for my PDFs). But it does choke on newlines if the object is long enough that it has not all been read into the file yet. This happens with: 1895 0 obj<</Count 253/Kids[1896 0 R 1 0 R 7 0 R 13 0 R ... etc., with 253 entries. PDF::API2::Basic::PDF::File::readval needs to read more data if it finds what could be a partial reference.
Thanks for the bug report. I think this should now be working as expected. See t/rt120397.t for the cases that are now being tested -- if anything is missing, add a test and let me know. On Sun Feb 26 17:41:13 2017, SPROUT wrote: Show quoted text
> In PDF syntax, an indirect reference consists of three distinct tokens > that can be separated by any PDF whitespace, and even comments. For > example, this is a syntactically valid indirect reference: > > 1 %eieio > 0 > R > > PDF::API2 does not allow comments at all (based on reading the code; > that is not a problem for my PDFs). But it does choke on newlines if > the object is long enough that it has not all been read into the file > yet. > > This happens with: > > 1895 0 > obj<</Count > 253/Kids[1896 > 0 > R > 1 > 0 > R > 7 > 0 > R > 13 > 0 > R > ... > > etc., with 253 entries. > > PDF::API2::Basic::PDF::File::readval needs to read more data if it > finds what could be a partial reference.
On Sat Jun 24 11:52:14 2017, SSIMMS wrote: Show quoted text
> Thanks for the bug report. > > I think this should now be working as expected. See t/rt120397.t for > the cases that are now being tested -- if anything is missing, add a > test and let me know.
Thank you. I’m afraid it is still not working. Attached is a sample 400-page PDF that it fails on. This PDF may not actually be valid. To keep the file size small, I made the pages Kids array reference the same PDF 400 times. Adobe Reader does not like this file, but there is nothing the specification to suggest that the same page object cannot be referenced multiple times in the Kids array. In any case, it makes a good test. I am not sure where you would put this in the repository, but a simple ok eval { PDF::API2->open("t/newlines.pdf") } will suffice.
Subject: newlines.pdf
Download newlines.pdf
application/pdf 2.7k

Message body not shown because it is not plain text.

On Sat Jun 24 14:41:07 2017, SPROUT wrote: Show quoted text
> On Sat Jun 24 11:52:14 2017, SSIMMS wrote:
> > Thanks for the bug report. > > > > I think this should now be working as expected. See t/rt120397.t for > > the cases that are now being tested -- if anything is missing, add a > > test and let me know.
> > Thank you. > > I’m afraid it is still not working. Attached is a sample 400-page PDF > that it fails on. This PDF may not actually be valid. To keep the > file size small, I made the pages Kids array reference the same PDF > 400 times. Adobe Reader does not like this file, but there is nothing > the specification to suggest that the same page object cannot be > referenced multiple times in the Kids array. > > In any case, it makes a good test. I am not sure where you would put > this in the repository, but a simple > > ok eval { PDF::API2->open("t/newlines.pdf") } > > will suffice.
BTW, the error I get is: Can't parse `R 3 0 R 3 0 ... many times over ... 3 0 R' near 1000 length 313. at /Library/Perl/5.12/PDF/API2/Basic/PDF/File.pm line 682.
Can you update to HEAD and try again, please? I made a couple more fixes an hour or so after updating this ticket, and I'm guessing you don't have that commit. The PDF you attached opens fine for me on HEAD (but not on the original fix). On Sat Jun 24 14:41:07 2017, SPROUT wrote: Show quoted text
> On Sat Jun 24 11:52:14 2017, SSIMMS wrote:
> > Thanks for the bug report. > > > > I think this should now be working as expected. See t/rt120397.t for > > the cases that are now being tested -- if anything is missing, add a > > test and let me know.
> > Thank you. > > I’m afraid it is still not working. Attached is a sample 400-page PDF > that it fails on. This PDF may not actually be valid. To keep the > file size small, I made the pages Kids array reference the same PDF > 400 times. Adobe Reader does not like this file, but there is nothing > the specification to suggest that the same page object cannot be > referenced multiple times in the Kids array. > > In any case, it makes a good test. I am not sure where you would put > this in the repository, but a simple > > ok eval { PDF::API2->open("t/newlines.pdf") } > > will suffice.
On Sat Jun 24 15:03:07 2017, SSIMMS wrote: Show quoted text
> Can you update to HEAD and try again, please? I made a couple more > fixes an hour or so after updating this ticket, and I'm guessing you > don't have that commit. > > The PDF you attached opens fine for me on HEAD (but not on the > original fix).
Yes, it works now. Thank you. Show quoted text
> > On Sat Jun 24 14:41:07 2017, SPROUT wrote:
> > On Sat Jun 24 11:52:14 2017, SSIMMS wrote:
> > > Thanks for the bug report. > > > > > > I think this should now be working as expected. See t/rt120397.t > > > for > > > the cases that are now being tested -- if anything is missing, add > > > a > > > test and let me know.
> > > > Thank you. > > > > I’m afraid it is still not working. Attached is a sample 400-page > > PDF > > that it fails on. This PDF may not actually be valid. To keep the > > file size small, I made the pages Kids array reference the same PDF > > 400 times. Adobe Reader does not like this file, but there is > > nothing > > the specification to suggest that the same page object cannot be > > referenced multiple times in the Kids array. > > > > In any case, it makes a good test. I am not sure where you would put > > this in the repository, but a simple > > > > ok eval { PDF::API2->open("t/newlines.pdf") } > > > > will suffice.