Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 112546
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: melmothx [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.026
Fixed in: 2.027



Subject: PDF::API2 version
Date: Sun, 28 Feb 2016 22:03:47 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
The attached file, created 3 years by pdflatex (it's the classic sample2e), which previously caused PDF::API2 to raise an exception which could be caught, now hangs indefinitively with CPU at 100%. #!perl use strict; use warnings; use Test::More tests => 1; use PDF::API2; diag $PDF::API2::VERSION; my $pdf = PDF::API2->open('t/resources/sample2e.pdf'); # ... Hangs ok ($pdf);
Download sample2e.pdf
application/pdf 133.5k

Message body not shown because it is not plain text.

Original report: https://rt.cpan.org/Public/Bug/Display.html?id=112461 Please let me know if I can be more useful Best wishes -- Marco
Subject: Re: [rt.cpan.org #112546]
Date: Sun, 28 Feb 2016 22:07:22 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
And sorry for the mangled subject... -- Marco
Subject: Infinite loop when opening certain PDFs
I'm seeing the same behavior with your test file and script. The PDF uses a cross-reference stream, so it wouldn't have been openable by previous versions of PDF::API2. Since you're getting the error as soon as you open the file, the infinite loop is very likely to be somewhere in the PDF/API2/Basic/PDF directory, probably File.pm or maybe Dict.pm. If you have a chance to do some triage before I get to it, start adding some "got here" debugging statements to the subs starting with the word "read" in both of those files. The loop is likely in one of them. Steve On Sun Feb 28 16:04:08 2016, melmothx@gmail.com wrote: Show quoted text
> > The attached file, created 3 years by pdflatex (it's the classic > sample2e), which previously caused PDF::API2 to raise an exception which > could be caught, now hangs indefinitively with CPU at 100%. > > #!perl > > use strict; > use warnings; > use Test::More tests => 1; > use PDF::API2; > diag $PDF::API2::VERSION; > my $pdf = PDF::API2->open('t/resources/sample2e.pdf'); > # ... Hangs > ok ($pdf); >
Subject: Re: [rt.cpan.org #112546] Infinite loop when opening certain PDFs
Date: Mon, 29 Feb 2016 09:49:40 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
"Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112546 > > > I'm seeing the same behavior with your test file and script. > > The PDF uses a cross-reference stream, so it wouldn't have been > openable by previous versions of PDF::API2. Since you're getting the > error as soon as you open the file, the infinite loop is very likely > to be somewhere in the PDF/API2/Basic/PDF directory, probably File.pm > or maybe Dict.pm. > > If you have a chance to do some triage before I get to it, start > adding some "got here" debugging statements to the subs starting with > the word "read" in both of those files. The loop is likely in one of > them.
Infinite loop is in readval. diff --git a/lib/PDF/API2/Basic/PDF/File.pm b/lib/PDF/API2/Basic/PDF/File.pm index 1f46d46..fa70cb3 100644 --- a/lib/PDF/API2/Basic/PDF/File.pm +++ b/lib/PDF/API2/Basic/PDF/File.pm @@ -479,6 +479,9 @@ sub readval { ($value, $str) = $self->readval($str, %opts); $result->{'null'} = $value; } + else { + die "None of the above changes: $reg_char or $ws_char\n"; + } $str = update($fh, $str) if $update; # thanks gareth.jones@stud.man.ac.uk } $str =~ s/^>>//; Output with some debug. Reading value $VAR1 = '<< /Type /Catalog /Pages 12 0 R Show quoted text
>>
'; $VAR2 = { 'update' => 0, 'objnum' => '64', 'objgen' => '0' }; It's a dict Looping on /Type /Catalog /Pages 12 0 R Show quoted text
>>
None of the above changes: [^][<>{}()/% \t\r\n\f\0] or [ \t\r\n\f\0] And now I'm kind of lost. I hope this helps. If you need further help, please just let me know. -- Marco
Subject: Re: [rt.cpan.org #112546] Infinite loop when opening certain PDFs
Date: Mon, 29 Feb 2016 10:26:50 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
More info. I tried to remove the leading whitespace, but now testing on PDF::Imposition dies with: String is $VAR1 = "/F7 8 0 R /F14 9 0 R /F27 10 0 R >>\n/ProcSet [ /PDF /Text ]\n>>\n<<\n/Type /Page\n/Contents 15 0 R\n/Resources 13 0 R\n/MediaBox [0 0 595.276 841.89]\n/Parent 12 0 R\n>"; F7 is PDF::API2::Basic::PDF::Objind=HASH(0x3fd3a60) String is $VAR1 = "/F14 9 0 R /F27 10 0 R >>\n/ProcSet [ /PDF /Text ]\n>>\n<<\n/Type /Page\n/Contents 15 0 R\n/Resources 13 0 R\n/MediaBox [0 0 595.276 841.89]\n/Parent 12 0 R\n>"; F14 is PDF::API2::Basic::PDF::Objind=HASH(0x3fd3bb0) String is $VAR1 = "/F27 10 0 R >>\n/ProcSet [ /PDF /Text ]\n>>\n<<\n/Type /Page\n/Contents 15 0 R\n/Resources 13 0 R\n/MediaBox [0 0 595.276 841.89]\n/Parent 12 0 R\n>"; F27 is PDF::API2::Basic::PDF::Objind=HASH(0x3fcdb88) Font is PDF::API2::Basic::PDF::Dict=HASH(0x4066a30) String is $VAR1 = "/ProcSet [ /PDF /Text ]\n>>\n<<\n/Type /Page\n/Contents 15 0 R\n/Resources 13 0 R\n/MediaBox [0 0 595.276 841.89]\n/Parent 12 0 R\n>"; Can't parse ` /PDF /Text ] Show quoted text
>>
<< /Type /Page /Contents 15 0 R /Resources 13 0 R /MediaBox [0 0 595.276 841.89] /Parent 12 0 R Show quoted text
>' near 136332 length 114. at /home/melmoth/perl5/lib/perl5/PDF/API2/Basic/PDF/File.pm line 665.
It looks like another problem, though, as the openining with the whitespace removing is OK (or just looks so). diff --git a/lib/PDF/API2/Basic/PDF/File.pm b/lib/PDF/API2/Basic/PDF/File.pm index 1f46d46..84e6e1b 100644 --- a/lib/PDF/API2/Basic/PDF/File.pm +++ b/lib/PDF/API2/Basic/PDF/File.pm @@ -139,6 +139,7 @@ is in PDF which contains the location of the previous cross-reference table. =cut use strict; +use Data::Dumper; no strict "refs"; use Scalar::Util qw(blessed); @@ -464,9 +465,14 @@ sub readval { $result = PDFDict(); while ($str !~ m/^>>/) { + # remove a leading newline if present + $Data::Dumper::Useqq = 1; + print "String is " . Dumper($str); + $str =~ s/\A$ws_char//; if ($str =~ s|^/($reg_char+)$ws_char?||) { my $key = PDF::API2::Basic::PDF::Name::name_to_string($1, $self); ($value, $str) = $self->readval($str, %opts); + print "$key is $value\n"; $result->{$key} = $value; } elsif ($str =~ s|^/$ws_char+||) { @@ -479,6 +485,9 @@ sub readval { ($value, $str) = $self->readval($str, %opts); $result->{'null'} = $value; } + else { + die "None of the above changes $str: $reg_char or $ws_char\n"; + } $str = update($fh, $str) if $update; # thanks gareth.jones@stud.man.ac.uk } $str =~ s/^>>//; -- Marco
Subject: Re: [rt.cpan.org #112546] Infinite loop when opening certain PDFs
Date: Mon, 29 Feb 2016 10:48:33 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
As a side note, being the author of two modules which rely on PDF::API2 (PDF::Cropmarks and PDF::Imposition), I would be grateful if you give me an head-up before a release, so I can test it and give you feedback (I believe this would be a win-win situation). https://metacpan.org/requires/distribution/PDF-API2?sort=[[2,1]] -- Marco
Subject: Infinite loop when opening certain PDFs
Try adding the following line at the beginning of the while loop in readval: --- a/lib/PDF/API2/Basic/PDF/File.pm +++ b/lib/PDF/API2/Basic/PDF/File.pm @@ -465,6 +465,7 @@ sub readval { $result = PDFDict(); while ($str !~ m/^>>/) { + $str =~ s/^$ws_char+//; if ($str =~ s|^/($reg_char+)$ws_char?||) { my $key = PDF::API2::Basic::PDF::Name::name_to_string($1, $self); ($value, $str) = $self->readval($str, %opts); Does that fix it on your end? On Mon Feb 29 03:49:54 2016, melmothx@gmail.com wrote: Show quoted text
> "Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: >
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=112546 > > > > > I'm seeing the same behavior with your test file and script. > > > > The PDF uses a cross-reference stream, so it wouldn't have been > > openable by previous versions of PDF::API2. Since you're getting the > > error as soon as you open the file, the infinite loop is very likely > > to be somewhere in the PDF/API2/Basic/PDF directory, probably File.pm > > or maybe Dict.pm. > > > > If you have a chance to do some triage before I get to it, start > > adding some "got here" debugging statements to the subs starting with > > the word "read" in both of those files. The loop is likely in one of > > them.
> > Infinite loop is in readval. > > diff --git a/lib/PDF/API2/Basic/PDF/File.pm > b/lib/PDF/API2/Basic/PDF/File.pm > index 1f46d46..fa70cb3 100644 > --- a/lib/PDF/API2/Basic/PDF/File.pm > +++ b/lib/PDF/API2/Basic/PDF/File.pm > @@ -479,6 +479,9 @@ sub readval { > ($value, $str) = $self->readval($str, %opts); > $result->{'null'} = $value; > } > + else { > + die "None of the above changes: $reg_char or > $ws_char\n"; > + } > $str = update($fh, $str) if $update; # thanks > gareth.jones@stud.man.ac.uk > } > $str =~ s/^>>//; > > Output with some debug. > > Reading value $VAR1 = '<< > /Type /Catalog > /Pages 12 0 R
> >>
> '; > $VAR2 = { > 'update' => 0, > 'objnum' => '64', > 'objgen' => '0' > }; > It's a dict > Looping on > /Type /Catalog > /Pages 12 0 R
> >>
> > None of the above changes: [^][<>{}()/% \t\r\n\f\0] or [ \t\r\n\f\0] > > And now I'm kind of lost. I hope this helps. If you need further help, > please just let me know.
Subject: Re: [rt.cpan.org #112546] Infinite loop when opening certain PDFs
Date: Wed, 02 Mar 2016 22:42:48 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
"Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112546 > > > Try adding the following line at the beginning of the while loop in readval: > > --- a/lib/PDF/API2/Basic/PDF/File.pm > +++ b/lib/PDF/API2/Basic/PDF/File.pm > @@ -465,6 +465,7 @@ sub readval { > $result = PDFDict(); > > while ($str !~ m/^>>/) { > + $str =~ s/^$ws_char+//; > if ($str =~ s|^/($reg_char+)$ws_char?||) { > my $key = PDF::API2::Basic::PDF::Name::name_to_string($1, $self); > ($value, $str) = $self->readval($str, %opts); > > Does that fix it on your end?
I tried something similar: https://rt.cpan.org/Public/Bug/Display.html?id=112546#txn-1601995 and that fixes the opening. But then, running the tests on https://metacpan.org/source/MELMOTHX/PDF-Imposition-0.21/t/accessors.t I get: Font is PDF::API2::Basic::PDF::Dict=HASH(0x4678118) String is $VAR1 = "/ProcSet [ /PDF /Text ]\n>>\n<<\n/Type /Page\n/Contents 15 0 R\n/Resources 13 0 R\n/MediaBox [0 0 595.276 841.89]\n/Parent 12 0 R\n>"; Can't parse ` /PDF /Text ] Show quoted text
>>
<< /Type /Page /Contents 15 0 R /Resources 13 0 R /MediaBox [0 0 595.276 841.89] /Parent 12 0 R Show quoted text
>' near 136332 length 114. at /home/melmoth/perl5/lib/perl5/PDF/API2/Basic/PDF/File.pm line 665.
# Looks like you planned 16 tests but ran 15. So, I'd dare to say that this particular issue is fixed, but something else breaks. I can try to reproduce that as well on another test, if you want so. -- Marco
Add the same line inside the while loop for arrays starting at line 627. I think that will do it, though I'm also going to check the other cases to see if anything is necessary there as well. On Wed Mar 02 16:42:58 2016, melmothx@gmail.com wrote: Show quoted text
> "Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: >
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=112546 > > > > > Try adding the following line at the beginning of the while loop in > > readval: > > > > --- a/lib/PDF/API2/Basic/PDF/File.pm > > +++ b/lib/PDF/API2/Basic/PDF/File.pm > > @@ -465,6 +465,7 @@ sub readval { > > $result = PDFDict(); > > > > while ($str !~ m/^>>/) { > > + $str =~ s/^$ws_char+//; > > if ($str =~ s|^/($reg_char+)$ws_char?||) { > > my $key = PDF::API2::Basic::PDF::Name::name_to_string($1, > > $self); > > ($value, $str) = $self->readval($str, %opts); > > > > Does that fix it on your end?
> > I tried something similar: > > https://rt.cpan.org/Public/Bug/Display.html?id=112546#txn-1601995 > > and that fixes the opening. > > But then, running the tests on > > https://metacpan.org/source/MELMOTHX/PDF-Imposition-0.21/t/accessors.t > > I get: > > Font is PDF::API2::Basic::PDF::Dict=HASH(0x4678118) > String is $VAR1 = "/ProcSet [ /PDF /Text ]\n>>\n<<\n/Type > /Page\n/Contents 15 0 R\n/Resources 13 0 R\n/MediaBox [0 0 595.276 > 841.89]\n/Parent 12 0 R\n>"; > Can't parse ` /PDF /Text ]
> >>
> << > /Type /Page > /Contents 15 0 R > /Resources 13 0 R > /MediaBox [0 0 595.276 841.89] > /Parent 12 0 R
> > ' near 136332 length 114. at > > /home/melmoth/perl5/lib/perl5/PDF/API2/Basic/PDF/File.pm line 665.
> # Looks like you planned 16 tests but ran 15. > > So, I'd dare to say that this particular issue is fixed, but something > else breaks. > > I can try to reproduce that as well on another test, if you want so.
Subject: Re: [rt.cpan.org #112546] Infinite loop when opening certain PDFs
Date: Wed, 02 Mar 2016 23:51:31 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Marco Pessotto <melmothx [...] gmail.com>
"Steve Simms via RT" <bug-PDF-API2@rt.cpan.org> writes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=112546 > > > Add the same line inside the while loop for arrays starting at line > 627. I think that will do it, though I'm also going to check the other > cases to see if anything is necessary there as well.
Looks so, yes. This seems to fix the issue. Thanks! diff --git a/lib/PDF/API2/Basic/PDF/File.pm b/lib/PDF/API2/Basic/PDF/File.pm index 1f46d46..d33d9cb 100644 --- a/lib/PDF/API2/Basic/PDF/File.pm +++ b/lib/PDF/API2/Basic/PDF/File.pm @@ -464,6 +464,8 @@ sub readval { $result = PDFDict(); while ($str !~ m/^>>/) { + # remove a leading newline if present + $str =~ s/\A$ws_char+//; if ($str =~ s|^/($reg_char+)$ws_char?||) { my $key = PDF::API2::Basic::PDF::Name::name_to_string($1, $self); ($value, $str) = $self->readval($str, %opts); @@ -479,6 +481,9 @@ sub readval { ($value, $str) = $self->readval($str, %opts); $result->{'null'} = $value; } + else { + die "None of the above changes $str: $reg_char or $ws_char\n"; + } $str = update($fh, $str) if $update; # thanks gareth.jones@stud.man.ac.uk } $str =~ s/^>>//; @@ -625,6 +630,7 @@ sub readval { $str = update($fh, $str) if $update; $result = PDFArray(); while ($str !~ m/^\]/) { + $str =~ s/\A$ws_char+//; ($value, $str) = $self->readval($str, %opts); $result->add_elements($value); $str = update($fh, $str) if $update; # str might just be exhausted! -- Marco
I've committed these two changes and one other potential breakage point. This only happens in an object stream (which requires a cross-reference stream). In normal files, initial white space (and comments) get removed while the file is being read, but object streams are already read into memory by the time they're getting parsed, so any extra white space needs to get removed separately.
On Mon Feb 29 04:48:51 2016, melmothx@gmail.com wrote: Show quoted text
> As a side note, being the author of two modules which rely on PDF::API2 > (PDF::Cropmarks and PDF::Imposition), I would be grateful if you give me > an head-up before a release, so I can test it and give you feedback (I > believe this would be a win-win situation).
Sure. I've added this to my release checklist. I think it may be possible to have GitHub notify you when changes get committed to the repository as well.