Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 131657
Status: open
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: winter [...] bfw-online.de
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: pdfapi2 regression 0ab55cd53 [Testcase]
Date: Mon, 3 Feb 2020 11:49:28 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Leon Winter <winter [...] bfw-online.de>
Hi, we are using your PDF::API2 module (via Debian) and just upgraded to a more recent version where we noticed a bug: Deep recursion on subroutine "PDF::API2::Basic::PDF::Objind::release" at /usr/share/perl5/PDF/API2/Basic/PDF/Objind.pm line 123. at /usr/share/perl5/PDF/API2/Basic/PDF/Objind.pm line 122. PDF::API2::Basic::PDF::Objind::release(PDF::API2::Outline=HASH(0x583c1788)) called at /usr/share/perl5/PDF/API2/Basic/PDF/Objind.pm line 122 I could identify the commit 0ab55cd535b3b3ac7f616589d00ff00864626030 as culprit. Using the current version (at master) with this commit reverted we do not run into this bug. We use the module to modify a 200+-page PDF to add outlines and titles. See my attached test case (Perl script) which I run on a empty 200+ page PDF file which I also attached. I created the PDF file this way: convert xc:none -page A4 a.pdf pdftk A=blank.pdf cat A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A output blank217.pdf Then run the script with a current pdfapi2 in path, I did: PERL5LIB=pdfapi2/lib perl -w testcase.pl Output after regression: Deep recursion on subroutine "PDF::API2::Basic::PDF::Objind::release" at pdfapi2/lib/PDF/API2/Basic/PDF/Objind.pm line 123. [...] Output before regression (0ab55cd535b3b3ac7f616589d00ff00864626030): (nothing) Thanks, Leon

Message body is not shown because sender requested not to inline it.

Download blank217.pdf
application/pdf 223.3k

Message body not shown because it is not plain text.

Well, I had some time today, what follows is maybe not a patch "as is", rather invitation for a review. Plus, I didn't investigate why there were no warnings before mentioned commit -- just looking at current code (though I suspect it is related to efforts to prevent memory leaks, circular refs, etc., therefore there's no way back to "before that commit"). Reduced to SSCCE, the test would be: use strict; use warnings; use PDF::API2; my $pdf = PDF::API2-> open( 'blank217.pdf' ); my $outlines = $pdf-> outlines; $outlines-> outline-> dest( $pdf-> openpage( $_ )) for 0 .. 100; $pdf-> { pdf }-> release; There are from 0 to quite a few warnings then, different from run to run, and only reason for that is "values" returns random order in line 115 https://metacpan.org/release/PDF-API2/source/lib/PDF/API2/Basic/PDF/Objind.pm#L115 Hm-m, OK. The Outlines tree is quite inter-connected, with Prev/Next links for intermediate/leaf, First/Last for root/intermediate, and Parent for all but root. I suspect, because of that, with just 101 bookmarks we easily (but not always) hit the deep recursion limit, then. Actually, many PDF structures are similarly inter-connected -- e.g. consider a pages tree. I think, though didn't try to confirm, that a relatively (maybe a bit more) complex pages tree can trigger the same warnings when "released", because of Parent member. I think, with Outlines tree, we don't have to follow any of Parent/First/Last/Prev/Next links to manually "release" them. For simple reason, that leafs/intermediate nodes are members of ' children' arrays. In fact, matter-of-fact autovivification of these arrays (https://metacpan.org/release/PDF-API2/source/lib/PDF/API2/Outline.pm#L131) is somewhat ...disturbing. OK, what I'm suggesting, is that said "Parent/First/Last/Prev/Next" values are weakened, and "weak" refs ARE NOT added to queue to be manually released. For Outlines, they are released anyway because there are stored in ' children'. For Pages, any "Parent" is already weakened if I'm reading the code right. With patches applied, there are no warnings however many times I run the test. I hope there are no memory leaks then. For structures other than Outlines or Pages, if PDF::API2 ever deals with them, the solution would be then to weaken refs which are released through other links -- then we don't hit "deep recursion" for trees of any complexity. What do you think? :) ------------- --- PDF/API2/Outline_old.pm Thu Feb 6 01:08:40 2020 +++ PDF/API2/Outline.pm Tue Feb 11 20:16:25 2020 @@ -32,36 +32,43 @@ $self->{'Prev'} = $prev if defined $prev; $self->{' api'} = $api; weaken $self->{' api'}; + weaken $self-> {'Parent'} if defined $parent; + weaken $self-> {'Prev'} if defined $prev; return $self; } sub parent { my $self = shift(); $self->{'Parent'} = shift() if defined $_[0]; + weaken $self-> {'Parent'}; return $self->{'Parent'}; } sub prev { my $self = shift(); $self->{'Prev'} = shift() if defined $_[0]; + weaken $self-> {'Prev'}; return $self->{'Prev'}; } sub next { my $self = shift(); $self->{'Next'} = shift() if defined $_[0]; + weaken $self-> {'Next'}; return $self->{'Next'}; } sub first { my $self = shift(); $self->{'First'} = $self->{' children'}->[0] if defined $self->{' children'} and defined $self->{' children'}->[0]; + weaken $self-> {'First'}; return $self->{'First'}; } sub last { my $self = shift(); $self->{'Last'} = $self->{' children'}->[-1] if defined $self->{' children'} and defined $self->{' children'}->[-1]; + weaken $self-> {'Last'}; return $self->{'Last'}; } --------------------------------- --- PDF/API2/Basic/PDF/Objind_old.pm Thu Feb 6 01:08:40 2020 +++ PDF/API2/Basic/PDF/Objind.pm Tue Feb 11 20:17:34 2020 @@ -14,6 +14,7 @@ use strict; use warnings; +use Scalar::Util 'isweak'; our $VERSION = '2.037'; # VERSION @@ -112,7 +113,7 @@ sub release { my ($self) = @_; - my @tofree = values %$self; + my @tofree = grep { !isweak $_ } values %$self; %$self = (); while (my $item = shift @tofree) { ---------------------------------
Sorry, it looks I added "fixed in 2.037"; I tried to add "my line numbers are for 2.037".
Subject: Re: [rt.cpan.org #131657] pdfapi2 regression 0ab55cd53 [Testcase]
Date: Thu, 13 Feb 2020 15:30:09 +0100
To: Vadim Repin via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Leon Winter <winter [...] bfw-online.de>
I can confirm that your patch will silence the warnings (also in our production code). Thanks
I see that the recommended changes were NOT put in PDF::API2 2.038. I tried running the latest PDF::Builder (3.020 prerelease) and got quite a few of the "Deep recursion" warnings. OK, to be expected. Then I made the recommended changes to the two files and ran again. No "Deep recursion" messages (or any other messages), but the PDF file produced will not load (damaged and not repairable). I took a look through the PDF (out.pdf) and it looks quite strange. Tons and tons of 3 byte Filter objects, and the cross reference table looks very odd: lots of repeated groups of '0000000000 65535 f' followed by 7 copies of '0000079535 00000 n'. The starting addresses vary, of course. I'll try to attach a copy of out.pdf. Any ideas? The OP said it stopped the error messages, but didn't say if it produced a good PDF. I did the same thing with PDF::API2 2.038, and got the same results.
Subject: out.pdf
Download out.pdf
application/pdf 299.1k

Message body not shown because it is not plain text.

Show quoted text
> the PDF file produced will not load
Looks like you forgot to binmode output handle under Windows. Show quoted text
> I took a look through the PDF (out.pdf) and it looks quite strange.
Is "blank217.pdf" different? I think the command it was generated with and periodic patterns you observe are linked somehow. Things strange, odd, or (perceived as) ugly or inefficient disturb humans. Machines don't care as long as formal syntax is OK.
On Sat Nov 14 17:22:46 2020, vadimr wrote: Show quoted text
> > the PDF file produced will not load
> > Looks like you forgot to binmode output handle under Windows.
Interesting. The original testcase.pl at the very end is (minus the binmode statement): $out = $pdf->stringify (); $pdf->end; open my $pdf_out, '>', 'out.pdf'; binmode $pdf_out; # added per Vadim Repin print $pdf_out $out; close $pdf_out; The resulting PDF loads cleanly now, and appears (unlike blank217) 2-up with an outline (bookmarks). Is that the intended result? If so, I think this can be marked "solved"!
Subject: out.pdf
Download out.pdf
application/pdf 299.1k

Message body not shown because it is not plain text.