Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 121832
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: futuramedium [...] yandex.ru
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.032



Subject: Invalid PDF file in test suite
The "sample-xrefstm.pdf" can't be read with any PDF viewer I tried. Line 12 should contain "/N 2" (not "1") -- number of objects in object stream. (Ehm, what does test, which uses this PDF, test? :) If file is broken). After that file can be viewed OK with Ghostscript, Firefox, etc. Still not good enough for Adobe Reader, but it looks like Reader issue, it refuses uncompressed object streams and/or xref streams (? - doesn't matter). + MediaBox is required, and though most software forgives its absence, it's probably better if test suite contains 100% valid PDFs.
On Tue May 23 09:40:43 2017, vadimr wrote: Show quoted text
> The "sample-xrefstm.pdf" can't be read with any PDF viewer I tried. > Line 12 should contain "/N 2" (not "1") -- number of objects in object > stream. (Ehm, what does test, which uses this PDF, test? :) If file is > broken).
Thanks -- I've updated the PDF. To answer your question, this file is used to test parts of the object stream handling code, which is still pretty new. It's not a perfect or complete test, but I figure some tests are better than no tests, which is where we started. If it opens in PDF readers, that's a bonus, but isn't the most important thing. Show quoted text
> it's probably better if test suite contains 100% valid PDFs.
Agreed. Do you know of any software or site that provides minimal, ASCII-charset sample PDFs that are suitable for testing individual parts of the PDF specification? I haven't been able to find anything, so I'm writing/modifying them by hand to get the features I need for testing purposes. I'd be happy not to have to do that. :-)
On Tue May 23 22:41:32 2017, SSIMMS wrote: Show quoted text
> On Tue May 23 09:40:43 2017, vadimr wrote:
> > The "sample-xrefstm.pdf" can't be read with any PDF viewer I tried. > > Line 12 should contain "/N 2" (not "1") -- number of objects in > > object > > stream. (Ehm, what does test, which uses this PDF, test? :) If file > > is > > broken).
> > Thanks -- I've updated the PDF. To answer your question, this file is > used to test parts of the object stream handling code, which is still > pretty new. It's not a perfect or complete test, but I figure some > tests are better than no tests, which is where we started. If it > opens in PDF readers, that's a bonus, but isn't the most important > thing. >
> > it's probably better if test suite contains 100% valid PDFs.
> > Agreed. Do you know of any software or site that provides minimal, > ASCII-charset sample PDFs that are suitable for testing individual > parts of the PDF specification? I haven't been able to find anything, > so I'm writing/modifying them by hand to get the features I need for > testing purposes. I'd be happy not to have to do that. :-)
I am doing exactly the same thing with PDF::Tiny. Feel free to steal my test PDFs.
From: futuramedium [...] yandex.ru
Show quoted text
> Do you know of any software or site that provides minimal, > ASCII-charset sample PDFs that are suitable for testing individual > parts of the PDF specification?
No, sorry, haven't seen such a collection. The reason why I opened "sample-xrefstm.pdf" at all, was that I was looking for a minimal available PDF 1.5 file to report another issue. I'll describe it next here instead of creating new ticket. It's, strictly speaking, Adobe issue, not PDF::API2's. But, unfortunately, in the bubble that I exist in, files which Acrobat rejects are automatically considered invalid. Whether it should concern PDF::API2, you decide :). The (fixed) "sample-xrefstm.pdf" is not suited well to investigate, because it is rejected by Acrobat "as is", from the beginning (which, like I said, is probably Adobe's issue as well). But consider any other 1.5 file with xref stream. Opening it with PDF::API2, making changes, and saving to file leads to incrementally updated PDF, with original xref table stream intact, and PDF::API2's appended xref table section being "classical". Nowhere in specification this is prohibited. Any other viewers are happy to open such files. But Acrobat either rejects them or tries to "fix", breaking them completely. I think it should be mentioned in "known issues", at least. Working around this issue could be to have "saving as" to totally rebuild PDF instead of appending i.e. not to incrementally update. I mean, as CAM::PDF::cleanoutput vs CAM::PDF::output. Then there'll be single "classical" xref table. Yet further, incremental update could append either "classical" or streamed xref section. I understand such changes can be difficult to implement.
From: futuramedium [...] yandex.ru
Sorry, as I see it's a known issue, already discussed (117184).
From: futuramedium [...] yandex.ru
Here's a method to add to API2.pm to re-build PDF::API2 instance and save it to file with a single "classical" xref table. sub not_very_clean_output { my ( $self, $fn ) = @_; delete $self-> { reopened }; delete $self-> { pdf }{ ' update' }; delete $self-> { pdf }{ ' loc' }; my %done; $done{ $self-> { pdf }{ ' objects' }{ $_-> uid }[ 0 ]} ++ for @{ $self-> { pdf }{' outlist'}}; my %h; # obj_num => gen_num my $tdict = $self-> { pdf }; while ( defined $tdict ) { my $sect = $tdict-> { ' xref' }; for ( keys %$sect ) { next unless /./; next if $done{ $_ }; my $ary = $sect-> { $_ }; next if $#$ary == 2 and $ary-> [ 2 ] eq 'f'; $h{ $_ } = $#$ary == 2 ? $ary-> [ 1 ] : 0 } $tdict = $tdict->{ ' prev' } } for my $objnum ( sort { $a <=> $b } keys %h ) { my $obj = $self-> { pdf }-> read_objnum( $objnum, $h{ $objnum }); $obj-> realise; $self-> { pdf }-> out_obj( $obj ) unless $obj-> { Type } and $obj-> { Type }-> val =~ /^(Xref|ObjStm)$/ # skip } $self-> saveas( $fn ); } "Not very clean" because it does bare minimum. Objects are not re-numbered (i.e. range consolidated, holes eliminated), un-used objects are not discarded. It seems to work, but not tested extensively, may serve as a base or for anyone in desperate need. Also, instance stability after calling this is not tested, maybe it should be destroyed or file re-opened. "next unless /./;" is for line 815 of PDF::API2::Basic::PDF::File.