Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 112456
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: stu [...] spacehopper.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.027



Subject: xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Fri, 26 Feb 2016 15:10:46 +0000
To: bug-PDF-API2 [...] rt.cpan.org
From: Stuart Henderson <stu [...] spacehopper.org>
I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have some awkward input files with xref streams which I've been pre-processing with mutool clean to get them into a format usable with PDF::API2, but thought I'd try them directly using the new xref stream support. Some such files now seem to be working OK but I have one that fails at open - if I do this: use PDF::API2; my $pdf = PDF::API2->open('letter.pdf'); I get: Can't call method "elementsof" on an undefined value at /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. Tracing it back, in open_scalar, this is returning undef: $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise(); My knowledge of perl OO and the PDF::API2 code is limited so I'm not sure where to go next, can you give me any pointers to help track it down further please? Unfortunately I can't make the file itself available. A bit more information about the file: $ pdfinfo file.pdf Title: <redacted> Author: Compiled Xerox JDL file. Creator: Paris Producer: Normalizer demonorm CreationDate: Tue Feb 16 08:04:11 2016 ModDate: Tue Feb 16 09:38:37 2016 Tagged: no UserProperties: no Suspects: no Form: none JavaScript: no Pages: 7276 Encrypted: no Page size: 595 x 842 pts (A4) Page rot: 0 File size: 9422943 bytes Optimized: yes PDF version: 1.6 I have other files generated by approximately the same procedure (certainly the same Producer etc) which I am now able to open with 2.026. Thanks, Stuart
Subject: Can't call method "elementsof" on an undefined value
Are you able to send the file to me privately? If so, that will let me help you troubleshoot the problem and figure out if it's a problem with PDF::API2 or a problem with the PDF file not following the spec (which may or may not be something that I can have the module work around). If not, the problem would seem to be that the PDF has a Pages dictionary (which contains information about a set of pages) that doesn't have the required Kids element (which contains an array of Page or Pages nodes). At a glance, the problem wouldn't necessarily be linked to adding support for cross-reference streams (other than it being possible to read that file now), but anything is possible. Steve On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote: Show quoted text
> I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have some > awkward input files with xref streams which I've been pre-processing > with mutool clean to get them into a format usable with PDF::API2, > but thought I'd try them directly using the new xref stream support. > > Some such files now seem to be working OK but I have one that fails > at open - if I do this: > > use PDF::API2; > my $pdf = PDF::API2->open('letter.pdf'); > > I get: > > Can't call method "elementsof" on an undefined value at > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. > > Tracing it back, in open_scalar, this is returning undef: > > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise(); > > My knowledge of perl OO and the PDF::API2 code is limited so I'm not > sure where to go next, can you give me any pointers to help track > it down further please? > > Unfortunately I can't make the file itself available. > > A bit more information about the file: > > $ pdfinfo file.pdf > Title: <redacted> > Author: Compiled Xerox JDL file. > Creator: Paris > Producer: Normalizer demonorm > CreationDate: Tue Feb 16 08:04:11 2016 > ModDate: Tue Feb 16 09:38:37 2016 > Tagged: no > UserProperties: no > Suspects: no > Form: none > JavaScript: no > Pages: 7276 > Encrypted: no > Page size: 595 x 842 pts (A4) > Page rot: 0 > File size: 9422943 bytes > Optimized: yes > PDF version: 1.6 > > I have other files generated by approximately the same procedure > (certainly the same Producer etc) which I am now able to open > with 2.026. > > Thanks, > Stuart
Subject: Re: [rt.cpan.org #112456] Can't call method "elementsof" on an undefined value
Date: Sun, 28 Feb 2016 01:50:11 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
Thanks for looking into this Steve. Good point that it's not necessarily linked to cross-reference stream support, sorry I should have thought of that possibility. I have a couple of dozen files generated in the same way and having now checked all of them this is the only one with this error. Unfortunately they are from a print job for a fairly sensitive mailing from a 3rd party that I'm doing some processing on and, much as I'd like to, I really can't share them. If I add 'use Data::Dumper;' and 'print Dumper($self->{'pages'});' after line 200 of PDF/API2.pm, i.e. 198 $self->{'pdf'}->{'Root'}->realise(); 199 $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise(); 200 $self->{'pdf'}->{' version'} ||= 3; 201 use Data::Dumper; -> 202 print Dumper($self->{'pages'}); 203 my @pages = proc_pages($self->{'pdf'}, $self->{'pages'}); 204 $self->{'pagestack'} = [sort { $a->{' pnum'} <=> $b->{' pnum'} } @pages]; 205 $self->{'catalog'} = $self->{'pdf'}->{'Root'}; 206 $self->{'reopened'} = 1; for the working files the structure is dumped, but for this broken one I get '$VAR1 = undef;', that's what I meant by "Tracing it back, in open_scalar, this is returning undef: $self->{'pages'} = [...]", so it doesn't seem like lack of a Kids element, rather that the Pages dictionaries aren't getting processed correctly, but I'm not sure which code is responsible for {'pdf'}->{'Root'}->{'Pages'}->realise() otherwise I would have looked there as well to look for differences between the problem file and the working ones. Also looking at all of the objects with "mutool show" and grepping for any mentioning /Pages without /Kids, I don't see any problems, I also had a look through with "itext rups" which was also happy with the file and I didn't spot any unusual Pages dictionaries. (there it looks like http://junkpile.org/pdfstructure-112456-1.png, basic file structure is a couple of levels of Pages, each of them having a Kids element with usually 10 entries, with Page at the deepest level). Stuart On 2016/02/27 17:12, Steve Simms via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > Are you able to send the file to me privately? If so, that will let me help you troubleshoot the problem and figure out if it's a problem with PDF::API2 or a problem with the PDF file not following the spec (which may or may not be something that I can have the module work around). > > If not, the problem would seem to be that the PDF has a Pages dictionary (which contains information about a set of pages) that doesn't have the required Kids element (which contains an array of Page or Pages nodes). > > At a glance, the problem wouldn't necessarily be linked to adding support for cross-reference streams (other than it being possible to read that file now), but anything is possible. > > Steve > > > On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote:
> > I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have some > > awkward input files with xref streams which I've been pre-processing > > with mutool clean to get them into a format usable with PDF::API2, > > but thought I'd try them directly using the new xref stream support. > > > > Some such files now seem to be working OK but I have one that fails > > at open - if I do this: > > > > use PDF::API2; > > my $pdf = PDF::API2->open('letter.pdf'); > > > > I get: > > > > Can't call method "elementsof" on an undefined value at > > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. > > > > Tracing it back, in open_scalar, this is returning undef: > > > > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise(); > > > > My knowledge of perl OO and the PDF::API2 code is limited so I'm not > > sure where to go next, can you give me any pointers to help track > > it down further please? > > > > Unfortunately I can't make the file itself available. > > > > A bit more information about the file: > > > > $ pdfinfo file.pdf > > Title: <redacted> > > Author: Compiled Xerox JDL file. > > Creator: Paris > > Producer: Normalizer demonorm > > CreationDate: Tue Feb 16 08:04:11 2016 > > ModDate: Tue Feb 16 09:38:37 2016 > > Tagged: no > > UserProperties: no > > Suspects: no > > Form: none > > JavaScript: no > > Pages: 7276 > > Encrypted: no > > Page size: 595 x 842 pts (A4) > > Page rot: 0 > > File size: 9422943 bytes > > Optimized: yes > > PDF version: 1.6 > > > > I have other files generated by approximately the same procedure > > (certainly the same Producer etc) which I am now able to open > > with 2.026. > > > > Thanks, > > Stuart
> > >
Try commenting out line 492 in lib/PDF/API2/Basic/PDF/File.pm: $result->{' streamloc'}-- if $fh->eof; Does that let you open the file? On Sat Feb 27 20:50:46 2016, stu@spacehopper.org wrote: Show quoted text
> Thanks for looking into this Steve. Good point that it's not > necessarily > linked to cross-reference stream support, sorry I should have thought > of > that possibility. > > I have a couple of dozen files generated in the same way and having > now > checked all of them this is the only one with this error. > Unfortunately > they are from a print job for a fairly sensitive mailing from a 3rd > party that I'm doing some processing on and, much as I'd like to, > I really can't share them. > > If I add 'use Data::Dumper;' and 'print Dumper($self->{'pages'});' > after line 200 of PDF/API2.pm, i.e. > > 198 $self->{'pdf'}->{'Root'}->realise(); > 199 $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}-
> >realise();
> 200 $self->{'pdf'}->{' version'} ||= 3; > 201 use Data::Dumper; > -> 202 print Dumper($self->{'pages'}); > 203 my @pages = proc_pages($self->{'pdf'}, $self->{'pages'}); > 204 $self->{'pagestack'} = [sort { $a->{' pnum'} <=> $b->{' pnum'} > } @pages]; > 205 $self->{'catalog'} = $self->{'pdf'}->{'Root'}; > 206 $self->{'reopened'} = 1; > > for the working files the structure is dumped, but for this broken one > I get '$VAR1 = undef;', that's what I meant by "Tracing it back, in > open_scalar, this is returning undef: $self->{'pages'} = [...]", > so it doesn't seem like lack of a Kids element, rather that the > Pages dictionaries aren't getting processed correctly, but I'm not > sure which code is responsible for {'pdf'}->{'Root'}->{'Pages'}-
> >realise()
> otherwise I would have looked there as well to look for differences > between the problem file and the working ones. > > Also looking at all of the objects with "mutool show" and grepping for > any mentioning /Pages without /Kids, I don't see any problems, I also > had a look through with "itext rups" which was also happy with the > file and I didn't spot any unusual Pages dictionaries. (there it > looks like http://junkpile.org/pdfstructure-112456-1.png, basic > file structure is a couple of levels of Pages, each of them having > a Kids element with usually 10 entries, with Page at the deepest > level). > > Stuart > > > On 2016/02/27 17:12, Steve Simms via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > > > Are you able to send the file to me privately? If so, that will let > > me help you troubleshoot the problem and figure out if it's a problem > > with PDF::API2 or a problem with the PDF file not following the spec > > (which may or may not be something that I can have the module work > > around). > > > > If not, the problem would seem to be that the PDF has a Pages > > dictionary (which contains information about a set of pages) that > > doesn't have the required Kids element (which contains an array of > > Page or Pages nodes). > > > > At a glance, the problem wouldn't necessarily be linked to adding > > support for cross-reference streams (other than it being possible to > > read that file now), but anything is possible. > > > > Steve > > > > > > On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote:
> > > I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have some > > > awkward input files with xref streams which I've been pre- > > > processing > > > with mutool clean to get them into a format usable with PDF::API2, > > > but thought I'd try them directly using the new xref stream > > > support. > > > > > > Some such files now seem to be working OK but I have one that fails > > > at open - if I do this: > > > > > > use PDF::API2; > > > my $pdf = PDF::API2->open('letter.pdf'); > > > > > > I get: > > > > > > Can't call method "elementsof" on an undefined value at > > > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. > > > > > > Tracing it back, in open_scalar, this is returning undef: > > > > > > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise(); > > > > > > My knowledge of perl OO and the PDF::API2 code is limited so I'm > > > not > > > sure where to go next, can you give me any pointers to help track > > > it down further please? > > > > > > Unfortunately I can't make the file itself available. > > > > > > A bit more information about the file: > > > > > > $ pdfinfo file.pdf > > > Title: <redacted> > > > Author: Compiled Xerox JDL file. > > > Creator: Paris > > > Producer: Normalizer demonorm > > > CreationDate: Tue Feb 16 08:04:11 2016 > > > ModDate: Tue Feb 16 09:38:37 2016 > > > Tagged: no > > > UserProperties: no > > > Suspects: no > > > Form: none > > > JavaScript: no > > > Pages: 7276 > > > Encrypted: no > > > Page size: 595 x 842 pts (A4) > > > Page rot: 0 > > > File size: 9422943 bytes > > > Optimized: yes > > > PDF version: 1.6 > > > > > > I have other files generated by approximately the same procedure > > > (certainly the same Producer etc) which I am now able to open > > > with 2.026. > > > > > > Thanks, > > > Stuart
> > > > > >
Subject: Re: [rt.cpan.org #112456] xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Wed, 2 Mar 2016 21:14:09 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
On 2016/03/02 15:56, Steve Simms via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > Try commenting out line 492 in lib/PDF/API2/Basic/PDF/File.pm: > > $result->{' streamloc'}-- if $fh->eof; > > Does that let you open the file?
No change with this.
You may also need to change line 1147 from: @index = (0, $tdict->{Size}->val - 1); to @index = (0, $tdict->{Size}->val); The last entry in a cross reference table is getting skipped when Index isn't present. On Wed Mar 02 15:55:55 2016, SSIMMS wrote: Show quoted text
> Try commenting out line 492 in lib/PDF/API2/Basic/PDF/File.pm: > > $result->{' streamloc'}-- if $fh->eof; > > Does that let you open the file? > > > On Sat Feb 27 20:50:46 2016, stu@spacehopper.org wrote:
> > Thanks for looking into this Steve. Good point that it's not > > necessarily > > linked to cross-reference stream support, sorry I should have thought > > of > > that possibility. > > > > I have a couple of dozen files generated in the same way and having > > now > > checked all of them this is the only one with this error. > > Unfortunately > > they are from a print job for a fairly sensitive mailing from a 3rd > > party that I'm doing some processing on and, much as I'd like to, > > I really can't share them. > > > > If I add 'use Data::Dumper;' and 'print Dumper($self->{'pages'});' > > after line 200 of PDF/API2.pm, i.e. > > > > 198 $self->{'pdf'}->{'Root'}->realise(); > > 199 $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}-
> > >realise();
> > 200 $self->{'pdf'}->{' version'} ||= 3; > > 201 use Data::Dumper; > > -> 202 print Dumper($self->{'pages'}); > > 203 my @pages = proc_pages($self->{'pdf'}, $self->{'pages'}); > > 204 $self->{'pagestack'} = [sort { $a->{' pnum'} <=> $b->{' pnum'} > > } @pages]; > > 205 $self->{'catalog'} = $self->{'pdf'}->{'Root'}; > > 206 $self->{'reopened'} = 1; > > > > for the working files the structure is dumped, but for this broken one > > I get '$VAR1 = undef;', that's what I meant by "Tracing it back, in > > open_scalar, this is returning undef: $self->{'pages'} = [...]", > > so it doesn't seem like lack of a Kids element, rather that the > > Pages dictionaries aren't getting processed correctly, but I'm not > > sure which code is responsible for {'pdf'}->{'Root'}->{'Pages'}-
> > >realise()
> > otherwise I would have looked there as well to look for differences > > between the problem file and the working ones. > > > > Also looking at all of the objects with "mutool show" and grepping for > > any mentioning /Pages without /Kids, I don't see any problems, I also > > had a look through with "itext rups" which was also happy with the > > file and I didn't spot any unusual Pages dictionaries. (there it > > looks like http://junkpile.org/pdfstructure-112456-1.png, basic > > file structure is a couple of levels of Pages, each of them having > > a Kids element with usually 10 entries, with Page at the deepest > > level). > > > > Stuart > > > > > > On 2016/02/27 17:12, Steve Simms via RT wrote:
> > > <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > > > > > Are you able to send the file to me privately? If so, that will let > > > me help you troubleshoot the problem and figure out if it's a problem > > > with PDF::API2 or a problem with the PDF file not following the spec > > > (which may or may not be something that I can have the module work > > > around). > > > > > > If not, the problem would seem to be that the PDF has a Pages > > > dictionary (which contains information about a set of pages) that > > > doesn't have the required Kids element (which contains an array of > > > Page or Pages nodes). > > > > > > At a glance, the problem wouldn't necessarily be linked to adding > > > support for cross-reference streams (other than it being possible to > > > read that file now), but anything is possible. > > > > > > Steve > > > > > > > > > On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote:
> > > > I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have some > > > > awkward input files with xref streams which I've been pre- > > > > processing > > > > with mutool clean to get them into a format usable with PDF::API2, > > > > but thought I'd try them directly using the new xref stream > > > > support. > > > > > > > > Some such files now seem to be working OK but I have one that fails > > > > at open - if I do this: > > > > > > > > use PDF::API2; > > > > my $pdf = PDF::API2->open('letter.pdf'); > > > > > > > > I get: > > > > > > > > Can't call method "elementsof" on an undefined value at > > > > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. > > > > > > > > Tracing it back, in open_scalar, this is returning undef: > > > > > > > > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}->realise(); > > > > > > > > My knowledge of perl OO and the PDF::API2 code is limited so I'm > > > > not > > > > sure where to go next, can you give me any pointers to help track > > > > it down further please? > > > > > > > > Unfortunately I can't make the file itself available. > > > > > > > > A bit more information about the file: > > > > > > > > $ pdfinfo file.pdf > > > > Title: <redacted> > > > > Author: Compiled Xerox JDL file. > > > > Creator: Paris > > > > Producer: Normalizer demonorm > > > > CreationDate: Tue Feb 16 08:04:11 2016 > > > > ModDate: Tue Feb 16 09:38:37 2016 > > > > Tagged: no > > > > UserProperties: no > > > > Suspects: no > > > > Form: none > > > > JavaScript: no > > > > Pages: 7276 > > > > Encrypted: no > > > > Page size: 595 x 842 pts (A4) > > > > Page rot: 0 > > > > File size: 9422943 bytes > > > > Optimized: yes > > > > PDF version: 1.6 > > > > > > > > I have other files generated by approximately the same procedure > > > > (certainly the same Producer etc) which I am now able to open > > > > with 2.026. > > > > > > > > Thanks, > > > > Stuart
> > > > > > > > >
> >
And one more fix -- change line 1132 from: $tdict->read_stream(); to $tdict->read_stream(1); Given the number of pages in your file, the stream may be large enough that it got output to a file rather than staying in memory. Adding "1" forces the stream to be read into memory, which is what the subsequent code assumes. On Wed Mar 02 16:30:11 2016, SSIMMS wrote: Show quoted text
> You may also need to change line 1147 from: > > @index = (0, $tdict->{Size}->val - 1); > > to > > @index = (0, $tdict->{Size}->val); > > The last entry in a cross reference table is getting skipped when > Index isn't present. > > > On Wed Mar 02 15:55:55 2016, SSIMMS wrote:
> > Try commenting out line 492 in lib/PDF/API2/Basic/PDF/File.pm: > > > > $result->{' streamloc'}-- if $fh->eof; > > > > Does that let you open the file? > > > > > > On Sat Feb 27 20:50:46 2016, stu@spacehopper.org wrote:
> > > Thanks for looking into this Steve. Good point that it's not > > > necessarily > > > linked to cross-reference stream support, sorry I should have > > > thought > > > of > > > that possibility. > > > > > > I have a couple of dozen files generated in the same way and having > > > now > > > checked all of them this is the only one with this error. > > > Unfortunately > > > they are from a print job for a fairly sensitive mailing from a 3rd > > > party that I'm doing some processing on and, much as I'd like to, > > > I really can't share them. > > > > > > If I add 'use Data::Dumper;' and 'print Dumper($self->{'pages'});' > > > after line 200 of PDF/API2.pm, i.e. > > > > > > 198 $self->{'pdf'}->{'Root'}->realise(); > > > 199 $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}-
> > > > realise();
> > > 200 $self->{'pdf'}->{' version'} ||= 3; > > > 201 use Data::Dumper; > > > -> 202 print Dumper($self->{'pages'}); > > > 203 my @pages = proc_pages($self->{'pdf'}, $self->{'pages'}); > > > 204 $self->{'pagestack'} = [sort { $a->{' pnum'} <=> $b->{' > > > pnum'} > > > } @pages]; > > > 205 $self->{'catalog'} = $self->{'pdf'}->{'Root'}; > > > 206 $self->{'reopened'} = 1; > > > > > > for the working files the structure is dumped, but for this broken > > > one > > > I get '$VAR1 = undef;', that's what I meant by "Tracing it back, in > > > open_scalar, this is returning undef: $self->{'pages'} = [...]", > > > so it doesn't seem like lack of a Kids element, rather that the > > > Pages dictionaries aren't getting processed correctly, but I'm not > > > sure which code is responsible for {'pdf'}->{'Root'}->{'Pages'}-
> > > > realise()
> > > otherwise I would have looked there as well to look for differences > > > between the problem file and the working ones. > > > > > > Also looking at all of the objects with "mutool show" and grepping > > > for > > > any mentioning /Pages without /Kids, I don't see any problems, I > > > also > > > had a look through with "itext rups" which was also happy with the > > > file and I didn't spot any unusual Pages dictionaries. (there it > > > looks like http://junkpile.org/pdfstructure-112456-1.png, basic > > > file structure is a couple of levels of Pages, each of them having > > > a Kids element with usually 10 entries, with Page at the deepest > > > level). > > > > > > Stuart > > > > > > > > > On 2016/02/27 17:12, Steve Simms via RT wrote:
> > > > <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > > > > > > > Are you able to send the file to me privately? If so, that will > > > > let > > > > me help you troubleshoot the problem and figure out if it's a > > > > problem > > > > with PDF::API2 or a problem with the PDF file not following the > > > > spec > > > > (which may or may not be something that I can have the module > > > > work > > > > around). > > > > > > > > If not, the problem would seem to be that the PDF has a Pages > > > > dictionary (which contains information about a set of pages) that > > > > doesn't have the required Kids element (which contains an array > > > > of > > > > Page or Pages nodes). > > > > > > > > At a glance, the problem wouldn't necessarily be linked to adding > > > > support for cross-reference streams (other than it being possible > > > > to > > > > read that file now), but anything is possible. > > > > > > > > Steve > > > > > > > > > > > > On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote:
> > > > > I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have > > > > > some > > > > > awkward input files with xref streams which I've been pre- > > > > > processing > > > > > with mutool clean to get them into a format usable with > > > > > PDF::API2, > > > > > but thought I'd try them directly using the new xref stream > > > > > support. > > > > > > > > > > Some such files now seem to be working OK but I have one that > > > > > fails > > > > > at open - if I do this: > > > > > > > > > > use PDF::API2; > > > > > my $pdf = PDF::API2->open('letter.pdf'); > > > > > > > > > > I get: > > > > > > > > > > Can't call method "elementsof" on an undefined value at > > > > > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. > > > > > > > > > > Tracing it back, in open_scalar, this is returning undef: > > > > > > > > > > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}-
> > > > > >realise();
> > > > > > > > > > My knowledge of perl OO and the PDF::API2 code is limited so > > > > > I'm > > > > > not > > > > > sure where to go next, can you give me any pointers to help > > > > > track > > > > > it down further please? > > > > > > > > > > Unfortunately I can't make the file itself available. > > > > > > > > > > A bit more information about the file: > > > > > > > > > > $ pdfinfo file.pdf > > > > > Title: <redacted> > > > > > Author: Compiled Xerox JDL file. > > > > > Creator: Paris > > > > > Producer: Normalizer demonorm > > > > > CreationDate: Tue Feb 16 08:04:11 2016 > > > > > ModDate: Tue Feb 16 09:38:37 2016 > > > > > Tagged: no > > > > > UserProperties: no > > > > > Suspects: no > > > > > Form: none > > > > > JavaScript: no > > > > > Pages: 7276 > > > > > Encrypted: no > > > > > Page size: 595 x 842 pts (A4) > > > > > Page rot: 0 > > > > > File size: 9422943 bytes > > > > > Optimized: yes > > > > > PDF version: 1.6 > > > > > > > > > > I have other files generated by approximately the same > > > > > procedure > > > > > (certainly the same Producer etc) which I am now able to open > > > > > with 2.026. > > > > > > > > > > Thanks, > > > > > Stuart
> > > > > > > > > > > >
> > > >
Subject: Re: [rt.cpan.org #112456] xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Wed, 2 Mar 2016 23:23:25 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
Aha - this changes things. I now have this instead: Invalid XRefStm entry type: 205 at /usr/local/libdata/perl5/site_perl/PDF/API2/Basic/PDF/File.pm line 1166. And I have received additional files from the same source, which did not hit the original problem, but some do also see the "Invalid XrefStm entry type", with different type numbers (100, 153). Those files are smaller (2.5MB). On 2016/03/02 16:33, Steve Simms via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > And one more fix -- change line 1132 from: > > $tdict->read_stream(); > > to > > $tdict->read_stream(1); > > Given the number of pages in your file, the stream may be large enough that it got output to a file rather than staying in memory. Adding "1" forces the stream to be read into memory, which is what the subsequent code assumes. > > On Wed Mar 02 16:30:11 2016, SSIMMS wrote:
> > You may also need to change line 1147 from: > > > > @index = (0, $tdict->{Size}->val - 1); > > > > to > > > > @index = (0, $tdict->{Size}->val); > > > > The last entry in a cross reference table is getting skipped when > > Index isn't present. > > > > > > On Wed Mar 02 15:55:55 2016, SSIMMS wrote:
> > > Try commenting out line 492 in lib/PDF/API2/Basic/PDF/File.pm: > > > > > > $result->{' streamloc'}-- if $fh->eof; > > > > > > Does that let you open the file? > > > > > > > > > On Sat Feb 27 20:50:46 2016, stu@spacehopper.org wrote:
> > > > Thanks for looking into this Steve. Good point that it's not > > > > necessarily > > > > linked to cross-reference stream support, sorry I should have > > > > thought > > > > of > > > > that possibility. > > > > > > > > I have a couple of dozen files generated in the same way and having > > > > now > > > > checked all of them this is the only one with this error. > > > > Unfortunately > > > > they are from a print job for a fairly sensitive mailing from a 3rd > > > > party that I'm doing some processing on and, much as I'd like to, > > > > I really can't share them. > > > > > > > > If I add 'use Data::Dumper;' and 'print Dumper($self->{'pages'});' > > > > after line 200 of PDF/API2.pm, i.e. > > > > > > > > 198 $self->{'pdf'}->{'Root'}->realise(); > > > > 199 $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}-
> > > > > realise();
> > > > 200 $self->{'pdf'}->{' version'} ||= 3; > > > > 201 use Data::Dumper; > > > > -> 202 print Dumper($self->{'pages'}); > > > > 203 my @pages = proc_pages($self->{'pdf'}, $self->{'pages'}); > > > > 204 $self->{'pagestack'} = [sort { $a->{' pnum'} <=> $b->{' > > > > pnum'} > > > > } @pages]; > > > > 205 $self->{'catalog'} = $self->{'pdf'}->{'Root'}; > > > > 206 $self->{'reopened'} = 1; > > > > > > > > for the working files the structure is dumped, but for this broken > > > > one > > > > I get '$VAR1 = undef;', that's what I meant by "Tracing it back, in > > > > open_scalar, this is returning undef: $self->{'pages'} = [...]", > > > > so it doesn't seem like lack of a Kids element, rather that the > > > > Pages dictionaries aren't getting processed correctly, but I'm not > > > > sure which code is responsible for {'pdf'}->{'Root'}->{'Pages'}-
> > > > > realise()
> > > > otherwise I would have looked there as well to look for differences > > > > between the problem file and the working ones. > > > > > > > > Also looking at all of the objects with "mutool show" and grepping > > > > for > > > > any mentioning /Pages without /Kids, I don't see any problems, I > > > > also > > > > had a look through with "itext rups" which was also happy with the > > > > file and I didn't spot any unusual Pages dictionaries. (there it > > > > looks like http://junkpile.org/pdfstructure-112456-1.png, basic > > > > file structure is a couple of levels of Pages, each of them having > > > > a Kids element with usually 10 entries, with Page at the deepest > > > > level). > > > > > > > > Stuart > > > > > > > > > > > > On 2016/02/27 17:12, Steve Simms via RT wrote:
> > > > > <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > > > > > > > > > Are you able to send the file to me privately? If so, that will > > > > > let > > > > > me help you troubleshoot the problem and figure out if it's a > > > > > problem > > > > > with PDF::API2 or a problem with the PDF file not following the > > > > > spec > > > > > (which may or may not be something that I can have the module > > > > > work > > > > > around). > > > > > > > > > > If not, the problem would seem to be that the PDF has a Pages > > > > > dictionary (which contains information about a set of pages) that > > > > > doesn't have the required Kids element (which contains an array > > > > > of > > > > > Page or Pages nodes). > > > > > > > > > > At a glance, the problem wouldn't necessarily be linked to adding > > > > > support for cross-reference streams (other than it being possible > > > > > to > > > > > read that file now), but anything is possible. > > > > > > > > > > Steve > > > > > > > > > > > > > > > On Fri Feb 26 10:11:25 2016, stu@spacehopper.org wrote:
> > > > > > I'm running PDF::API2 2.026 on perl 5.20.2 on OpenBSD. I have > > > > > > some > > > > > > awkward input files with xref streams which I've been pre- > > > > > > processing > > > > > > with mutool clean to get them into a format usable with > > > > > > PDF::API2, > > > > > > but thought I'd try them directly using the new xref stream > > > > > > support. > > > > > > > > > > > > Some such files now seem to be working OK but I have one that > > > > > > fails > > > > > > at open - if I do this: > > > > > > > > > > > > use PDF::API2; > > > > > > my $pdf = PDF::API2->open('letter.pdf'); > > > > > > > > > > > > I get: > > > > > > > > > > > > Can't call method "elementsof" on an undefined value at > > > > > > /usr/local/libdata/perl5/site_perl/PDF/API2.pm line 870. > > > > > > > > > > > > Tracing it back, in open_scalar, this is returning undef: > > > > > > > > > > > > $self->{'pages'} = $self->{'pdf'}->{'Root'}->{'Pages'}-
> > > > > > >realise();
> > > > > > > > > > > > My knowledge of perl OO and the PDF::API2 code is limited so > > > > > > I'm > > > > > > not > > > > > > sure where to go next, can you give me any pointers to help > > > > > > track > > > > > > it down further please? > > > > > > > > > > > > Unfortunately I can't make the file itself available. > > > > > > > > > > > > A bit more information about the file: > > > > > > > > > > > > $ pdfinfo file.pdf > > > > > > Title: <redacted> > > > > > > Author: Compiled Xerox JDL file. > > > > > > Creator: Paris > > > > > > Producer: Normalizer demonorm > > > > > > CreationDate: Tue Feb 16 08:04:11 2016 > > > > > > ModDate: Tue Feb 16 09:38:37 2016 > > > > > > Tagged: no > > > > > > UserProperties: no > > > > > > Suspects: no > > > > > > Form: none > > > > > > JavaScript: no > > > > > > Pages: 7276 > > > > > > Encrypted: no > > > > > > Page size: 595 x 842 pts (A4) > > > > > > Page rot: 0 > > > > > > File size: 9422943 bytes > > > > > > Optimized: yes > > > > > > PDF version: 1.6 > > > > > > > > > > > > I have other files generated by approximately the same > > > > > > procedure > > > > > > (certainly the same Producer etc) which I am now able to open > > > > > > with 2.026. > > > > > > > > > > > > Thanks, > > > > > > Stuart
> > > > > > > > > > > > > > >
> > > > > >
> > >
Subject: Re: [rt.cpan.org #112456] xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Wed, 2 Mar 2016 23:35:13 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
.. If I disable the "Invalid XrefStm entry type" check, I then get Cannot find the compressed object stream at /usr/local/libdata/perl5/site_perl/PDF/API2/Basic/PDF/File.pm line 696. As it can't read it in at all, I can't give output from your new pdf-debug.pl; if output from any other tools would be useful, there are: "mutool show $file xref" https://pbot.rmdir.de/XeJ-kZtBSK6Q5WMwC9ph6w "qpdf --show-xref $file" https://pbot.rmdir.de/pqZ6vuMp-61hQJltIhp1jQ On 2016/03/02 23:23, Stuart Henderson wrote: Show quoted text
> Aha - this changes things. I now have this instead: > > Invalid XRefStm entry type: 205 at /usr/local/libdata/perl5/site_perl/PDF/API2/Basic/PDF/File.pm line 1166. > > And I have received additional files from the same source, which did > not hit the original problem, but some do also see the "Invalid XrefStm > entry type", with different type numbers (100, 153). Those files are > smaller (2.5MB).
On Wed Mar 02 18:23:58 2016, stu@spacehopper.org wrote: Show quoted text
> Aha - this changes things. I now have this instead: > > Invalid XRefStm entry type: 205 at > /usr/local/libdata/perl5/site_perl/PDF/API2/Basic/PDF/File.pm line > 1166.
That means the module isn't reading or decoding the stream correctly -- entry types should only be 0, 1, or 2. Can you update to the latest code on GitHub and try again? That error should now give you the object number and generation of the XRef stream that's not being read properly. You should then be able to find that "# # obj" string in the PDF (using a text editor), which will be followed by a dictionary between << and >> characters. Can you paste that dictionary here, please? It might give some hints as to what's happening. If you have an editor that won't mangle binary characters, can you also attach everything from the "# # obj" until the next occurrence of "endstream"? As long as the dictionary has a Type of XRef, the stream only contains an encoded set of object numbers and locations -- there won't be any sensitive data (see PDF 1.7 section 7.5.8.3 if you want to double-check).
Subject: Re: [rt.cpan.org #112456] xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Thu, 3 Mar 2016 04:11:44 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
Show quoted text
> You should then be able to find that "# # obj" string in the PDF (using a text editor), which will be followed by a dictionary between << and >> characters. Can you paste that dictionary here, please? It might give some hints as to what's happening.
Here's the dictionary: <</DecodeParms<</Columns 5/Predictor 12>>/Filter/FlateDecode/ID[<0382CD380585BB83E6E114E743A2319F><1B7FABFCA277174AB014EEE3459651E0>]/Info 29923 0 R/Length 21377/Root 29925 0 R/Size 29924/Type/XRef/W[1 3 1]>> The rest is attached and also at https://junkpile.org/xrefstm.bin in case the attachment gets mangled.
Download xrefstm.bin
application/octet-stream 21.1k

Message body not shown because it is not plain text.

I just fixed another bug that could have resulted in a corrupt stream. Could you update again and give it a shot, please? You're getting a different error than I was, but the cause could've been the same. On Wed Mar 02 23:12:18 2016, stu@spacehopper.org wrote: Show quoted text
> > You should then be able to find that "# # obj" string in the PDF > > (using a text editor), which will be followed by a dictionary between > > << and >> characters. Can you paste that dictionary here, please? > > It might give some hints as to what's happening.
> > Here's the dictionary: > > <</DecodeParms<</Columns 5/Predictor
> 12>>/Filter/FlateDecode/ID[<0382CD380585BB83E6E114E743A2319F><1B7FABFCA277174AB014EEE3459651E0>]/Info
> 29923 0 R/Length 21377/Root 29925 0 R/Size 29924/Type/XRef/W[1 3 1]>> > > The rest is attached and also at https://junkpile.org/xrefstm.bin in > case the attachment gets mangled.
Subject: Re: [rt.cpan.org #112456] xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Thu, 3 Mar 2016 11:57:36 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
Hi Steve, sorry that's not the bug I'm hitting - but the good news is I've dumped the stream at various points in the code and tracked it down further. In Basic/PDF/File.pm around line 1151 there's $tdict->read_stream(1). Looking at the contents of $tdict->{' stream'} around there: Before read_stream: bad+good files both match the encoded stream as dumped by "mutool show -e -b". After: good file matches the uncompressed stream as dumped by "mutool show -b", *but* the bad file does not match. So I'm now looking at Basic/PDF/Dict.pm. One difference between working/broken files is that the compressed stream in the working one is <4096 bytes and the broken one is >4096. So as a hack I've bumped everything in Dict.pm to read in 32K chunks instead; I'm now able to read these files. So if you go the other way and reduce these from 4096 to, say, 512 bytes, I expect you will be able to reproduce the corrupted stream. $ ./pdf-debug.pl $file xref XRef at 116 ----------- Filter: FlateDecode ID: [ <0382cd380585bb83e6e114e743a2319f> <1b7fabfca277174ab014eee3459651e0> ] Index: [ 29924 264 ] Info: <Object 29923> Length: 479 Prev: 9401296 Root: <Object 29925> Size: 30188 Type: XRef W: [ 1 3 1 ] DecodeParms: Columns: 5 Predictor: 12 Stream ------ [Stream contains non-printable characters]
That sounds promising. Thanks for digging into it. On Thu Mar 03 06:58:09 2016, stu@spacehopper.org wrote: Show quoted text
> Hi Steve, sorry that's not the bug I'm hitting - but the good news > is I've dumped the stream at various points in the code and tracked > it down further. > > In Basic/PDF/File.pm around line 1151 there's $tdict->read_stream(1). > Looking at the contents of $tdict->{' stream'} around there: > > Before read_stream: bad+good files both match the encoded stream as > dumped by "mutool show -e -b". > > After: good file matches the uncompressed stream as dumped by "mutool > show -b", *but* the bad file does not match. > > So I'm now looking at Basic/PDF/Dict.pm. > > One difference between working/broken files is that the compressed > stream in the working one is <4096 bytes and the broken one is >4096. > So as a hack I've bumped everything in Dict.pm to read in 32K chunks > instead; I'm now able to read these files. > > So if you go the other way and reduce these from 4096 to, say, 512 > bytes, I expect you will be able to reproduce the corrupted stream. > > $ ./pdf-debug.pl $file xref > XRef at 116 > ----------- > Filter: FlateDecode > ID: [ <0382cd380585bb83e6e114e743a2319f> > <1b7fabfca277174ab014eee3459651e0> ] > Index: [ 29924 264 ] > Info: <Object 29923> > Length: 479 > Prev: 9401296 > Root: <Object 29925> > Size: 30188 > Type: XRef > W: [ 1 3 1 ] > DecodeParms: > Columns: 5 > Predictor: 12 > > Stream > ------ > [Stream contains non-printable characters]
On Thu Mar 03 06:58:09 2016, stu@spacehopper.org wrote: Show quoted text
> One difference between working/broken files is that the compressed > stream in the working one is <4096 bytes and the broken one is >4096. > So as a hack I've bumped everything in Dict.pm to read in 32K chunks > instead; I'm now able to read these files. > > So if you go the other way and reduce these from 4096 to, say, 512 > bytes, I expect you will be able to reproduce the corrupted stream.
As it turned out, it also required that the buffer size not be a multiple of the row length, but I was able to reproduce it. There's a fix on GitHub now -- can you confirm that it works for your file? Thanks again for helping troubleshoot this bug!
Subject: Re: [rt.cpan.org #112456] xref stream file in 2.026: Can't call method "elementsof" on an undefined value
Date: Wed, 9 Mar 2016 23:11:55 +0000
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Stuart Henderson <stu [...] spacehopper.org>
On 2016/03/09 10:12, Steve Simms via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112456 > > > On Thu Mar 03 06:58:09 2016, stu@spacehopper.org wrote:
> > One difference between working/broken files is that the compressed > > stream in the working one is <4096 bytes and the broken one is >4096. > > So as a hack I've bumped everything in Dict.pm to read in 32K chunks > > instead; I'm now able to read these files. > > > > So if you go the other way and reduce these from 4096 to, say, 512 > > bytes, I expect you will be able to reproduce the corrupted stream.
> > As it turned out, it also required that the buffer size not be a multiple of the row length, but I was able to reproduce it. There's a fix on GitHub now -- can you confirm that it works for your file? > > Thanks again for helping troubleshoot this bug!
Great, I'm glad you were able to reproduce and track it down! Yes, I can confirm this is now working for my files.