Thanks for looking into this Steve. Good point that it's not necessarily
linked to cross-reference stream support, sorry I should have thought of
that possibility.
I have a couple of dozen files generated in the same way and having now
checked all of them this is the only one with this error. Unfortunately
they are from a print job for a fairly sensitive mailing from a 3rd
party that I'm doing some processing on and, much as I'd like to,
I really can't share them.
If I add 'use Data::Dumper;' and 'print Dumper($self->{'pages'});'
after line 200 of PDF/API2.pm, i.e.
198 $self->{'pdf'}->{'Root'}->realise();
199 $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise();
200 $self->{'pdf'}->{' version'} ||= 3;
201 use Data::Dumper;
-> 202 print Dumper($self->{'pages'});
203 my @pages = proc_pages($self->{'pdf'}, $self->{'pages'});
204 $self->{'pagestack'} = [sort { $a->{' pnum'} <=> $b->{' pnum'} } @pages];
205 $self->{'catalog'} = $self->{'pdf'}->{'Root'};
206 $self->{'reopened'} = 1;
for the working files the structure is dumped, but for this broken one
I get '$VAR1 = undef;', that's what I meant by "Tracing it back, in
open_scalar, this is returning undef: $self->{'pages'} = [...]",
so it doesn't seem like lack of a Kids element, rather that the
Pages dictionaries aren't getting processed correctly, but I'm not
sure which code is responsible for {'pdf'}->{'Root'}->{'Pages'}->realise()
otherwise I would have looked there as well to look for differences
between the problem file and the working ones.
Also looking at all of the objects with "mutool show" and grepping for
any mentioning /Pages without /Kids, I don't see any problems, I also
had a look through with "itext rups" which was also happy with the
file and I didn't spot any unusual Pages dictionaries. (there it
looks like
http://junkpile.org/pdfstructure-112456-1.png, basic
file structure is a couple of levels of Pages, each of them having
a Kids element with usually 10 entries, with Page at the deepest
level).
Stuart
On 2016/02/27 17:12, Steve Simms via RT wrote:
Show quoted text> <URL:
https://rt.cpan.org/Ticket/Display.html?id=112456 >
>
> Are you able to send the file to me privately? If so, that will let me help you troubleshoot the problem and figure out if it's a problem with PDF::API2 or a problem with the PDF file not following the spec (which may or may not be something that I can have the module work around).
>
> If not, the problem would seem to be that the PDF has a Pages dictionary (which contains information about a set of pages) that doesn't have the required Kids element (which contains an array of Page or Pages nodes).
>
> At a glance, the problem wouldn't necessarily be linked to adding support for cross-reference streams (other than it being possible to read that file now), but anything is possible.
>
> Steve
>
>
> On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote:
> > I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have some
> > awkward input files with xref streams which I've been pre-processing
> > with mutool clean to get them into a format usable with PDF::API2,
> > but thought I'd try them directly using the new xref stream support.
> >
> > Some such files now seem to be working OK but I have one that fails
> > at open - if I do this:
> >
> > use PDF::API2;
> > my $pdf = PDF::API2->open('letter.pdf');
> >
> > I get:
> >
> > Can't call method "elementsof" on an undefined value at
> > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870.
> >
> > Tracing it back, in open_scalar, this is returning undef:
> >
> > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise();
> >
> > My knowledge of perl OO and the PDF::API2 code is limited so I'm not
> > sure where to go next, can you give me any pointers to help track
> > it down further please?
> >
> > Unfortunately I can't make the file itself available.
> >
> > A bit more information about the file:
> >
> > $ pdfinfo file.pdf
> > Title: <redacted>
> > Author: Compiled Xerox JDL file.
> > Creator: Paris
> > Producer: Normalizer demonorm
> > CreationDate: Tue Feb 16 08:04:11 2016
> > ModDate: Tue Feb 16 09:38:37 2016
> > Tagged: no
> > UserProperties: no
> > Suspects: no
> > Form: none
> > JavaScript: no
> > Pages: 7276
> > Encrypted: no
> > Page size: 595 x 842 pts (A4)
> > Page rot: 0
> > File size: 9422943 bytes
> > Optimized: yes
> > PDF version: 1.6
> >
> > I have other files generated by approximately the same procedure
> > (certainly the same Producer etc) which I am now able to open
> > with 2.026.
> >
> > Thanks,
> > Stuart
>
>
>