Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 131147
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: Klaus [...] Ethgen.ch
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.037



Subject: Bug with missing properties including patch
Date: Fri, 6 Dec 2019 18:30:14 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Klaus Ethgen <Klaus [...] Ethgen.ch>
Hello, the attached pdf will fail in importPageIntoForm as it does not have a "Rotate" property. The patch also attached will fix that. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16 Klaus Ethgen <Klaus@Ethgen.ch> Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C
Download gaforkarte_schweiz.pdf
application/pdf 1.5m

Message body not shown because it is not plain text.

Message body is not shown because sender requested not to inline it.

Download signature.asc
application/pgp-signature 688b

Message body not shown because it is not plain text.

1. Can you give some PDF::API2 code that shows the problem? 2. How does this compare to RT 130722? Might it be the same problem?
There's "Parent" key in pages root with value of "null". Failing code: Show quoted text
>perl -MPDF::API2 -e "PDF::API2->new->importPageIntoForm(PDF::API2->open('gaforkarte_schweiz.pdf'),1)"
With patch applied, e.g. adding a new page still fails: Show quoted text
>perl -MPDF::API2 -e "PDF::API2->open('gaforkarte_schweiz.pdf')->page"
Failing line number is in familiar from couple years ago method "add_page", which I don't want to investigate again :) Further, there are nulls as values of properties of outline item in the file. This PDF producer seems to add nulls in unexpected places, hard to tell if any other PDF::API2 methods will fail. PDF is still valid, because the Reference says: "Specifying the null object as the value of a dictionary entry ... is equivalent to omitting the entry entirely." I'd suggest patching with this in mind, i.e. ignore such entries on read and therefore don't preserve them on update. Replace line 480 (version 2.036) PDF/API2/Basic/PDF/File.pm: # $result->{$key} = $value; $result->{$key} = $value unless ref($value) eq 'PDF::API2::Basic::PDF::Null'; Won't help in (hypothetical) case of "null" being value of indirect object, but such files are yet to be seen.
On Mon Dec 09 09:41:13 2019, vadimr wrote: Show quoted text
> Failing line number is in familiar from couple years ago method > "add_page", which I don't want to investigate again :)
Was that RT 121911 by any chance? That bug has been fixed in both API2 and Builder, so I'm wondering (if and) how this one is related. Show quoted text
> I'd suggest patching with this in mind, i.e. ignore such entries on > read and therefore don't preserve them on update.
Vadim, I applied your patch (File.pm) and it appears to fix the problem (both Perl one-liners you gave above). It also doesn't break any t-tests or examples in PDF::Builder. Note that I did NOT add Klaus's patch to Pages.pm -- is his patch made unnecessary by yours? As he has not provided any test code, I can't be absolutely sure that your patch alone fixes the problem. Per my earlier question about RT 130722, this patch does not seem to affect it. If Klaus gives his blessing to your patch (alone), I will release it in PDF::Builder. It's up to Steve as to what he wants to do with PDF::API2.
Subject: Re: [rt.cpan.org #131147] Bug with missing properties including patch
Date: Mon, 16 Dec 2019 18:18:09 +0100
To: "Phil M. Perry via RT" <bug-PDF-API2 [...] rt.cpan.org>
From: Klaus Ethgen <Klaus [...] Ethgen.ch>
Hi, Am Mo den 16. Dez 2019 um 18:11 schrieb Phil M. Perry via RT: Show quoted text
> > I'd suggest patching with this in mind, i.e. ignore such entries on > > read and therefore don't preserve them on update.
> Vadim, I applied your patch (File.pm) and it appears to fix the problem (both Perl one-liners you gave above). It also doesn't break any t-tests or examples in PDF::Builder. Note that I did NOT add Klaus's patch to Pages.pm -- is his patch made unnecessary by yours? As he has not provided any test code, I can't be absolutely sure that your patch alone fixes the problem.
Hmm, didn't I provide a (broken) PDF? I thought I had. I had the trouble in the following part: sub import_pdf { my $dest = shift; my $source = shift; my $urlprefs = shift; my $pages = $source->pages; for (my $i = 1; $i <= $pages; $i++) { my $pref = {}; $pref = $urlprefs->[0] if exists($urlprefs->[0]); $pref = $urlprefs->[$i] if exists($urlprefs->[$i]); my $rot = $pref->{rotate} || 0; my $displayrot = $pref->{displayrotate} || $rot; my $trans = $transform{$rot} || [0, 0]; my $page = $dest->page; $page->rotate($displayrot); my $gfx = $page->gfx; my $xo = $dest->importPageIntoForm($source, $i); $gfx->rotate($rot); $gfx->formimage($xo, $trans->[0] + ($pref->{x} || 0), $trans->[1] + ($pref->{y} || 0), $pref->{scale} || 1); } ## end for (my $i = 1; $i <= $pages...) return $pages; } ## end sub import_pdf It is part of a dirty hack to collect pdf as well as plain text or images and put them together into a single pdf. I will have a look at Vadims patch. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16 Klaus Ethgen <Klaus@Ethgen.ch> Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C
Download signature.asc
application/pgp-signature 688b

Message body not shown because it is not plain text.

On Mon Dec 16 12:18:18 2019, Klaus@Ethgen.ch wrote: Show quoted text
> Hmm, didn't I provide a (broken) PDF? I thought I had.
You provided a PDF (map of Switzerland) that loads correctly. That's the one that Vadim used in his two one-line examples. You provided a patch, but I didn't see any example code showing the problem you're reporting. That's what I was asking to see. Let me play with the code you just listed, with and without Vadim's patch, and perhaps with and without your patch, and see what happens.
I added the following code to drive your import_pdf() routine ---------------------------------------------------- use strict; use warnings; use PDF::Builder; my @prefs; # array of hashes, one per page # rotate (default 0) # displayrotate (default 'rotate' value) # x (default 0) # y (default 0) # scale (default 1) my $source = PDF::Builder->open('gaforkarte_schweiz.pdf'); my $dest = PDF::Builder->new(); my %transform; # => rotate (default [0,0]) $dest->mediabox('A4'); #$prefs[0] = {'scale' => 0.35}; $prefs[0] = {'rotate'=>-90, 'x'=>-600,'y'=>-100, 'scale' => 0.35}; import_pdf($dest, $source, \@prefs); $dest->saveas("output.pdf"); ---------------------------------------------- with Vadim's patch, and it seems to produce a good PDF. Without his patch I got the error about the missing 'find_prop'. rotate() is CLOCKWISE and must be a multiple of +/-90 degrees. The x and y offsets need to be adjusted to keep the image on the page.
Show quoted text
> How does this compare to RT 130722? Might it be the same problem? > Was that RT 121911 by any chance? ...how this one is related. > ...Klaus's patch to Pages.pm -- is his patch made unnecessary by yours?
Phil, RT 130722 was about malformed PDF with broken pages tree, and I remain of the same opinion: no correction was required on PDF::API2 side; that RT can not (and should not) be compared to others concerning valid PDF files; error message and/or source line number being the same (or close) is co-incidental and of no importance. Klaus's patch would fix the issue with "importPageIntoForm" with attached PDF, but my 2nd one-liner was to show that, following the route of applying a fix close to exposed line number, the 2nd patch would be required to fix failing "page" call, with line number somewhere close to RT 121911. That's the only reason I mentioned it. No other connection. Instead, my idea was to find more universal solution. The suggested patch should practically work OK, but I think some considerations are to be documented at least. The "null" object itself is not a scarecrow, it's very common in, e.g., explicit destination arrays. But, as value of dictionary entry, the only place in the Reference I found is quoted above. Does it mean such entries can be safely discarded? Can any entry, which is strongly "required", somehow finish having value "null"? I don't think so, but, what if, per quote above, we discard this entry, and "required" entry disappears from updated file? PDF ideology, which I agree with and which is NOT strictly adhered to by, e.g. PDF::API2, is don't touch entries you don't know what to do about. They can be from some future PDF specification :). I don't like the idea of throwing things away. But that's pure speculation. Ideally, PDF::API2::Basic::PDF::Dict should access dict entries with e.g. "key_exists", which checks if value of this key resolves to "null" and would return "false", and a getter "get_value", which similarly returns same value (e.g. undef) for non-existing keys or if value is "null". In practice, it will never happen. Given the way PDF::API2 was designed to traverse pages tree, I think the cheapest (and rather safe, regardless of what said above) way to fix the present RT would be with my patch.
Vadim, I asked about 130722, as both bugs were referencing 'find_prop' in their error messages. It sounds like your opinion is that they are completely unrelated, and that 130722 is strictly due to corruption of the PDF produced by someone's editing of it. My invitation still stands for the OP to show that some commercial product produced the bad PDF, in which case it might be appropriate to put something in PDF::API2/Builder to deal with it. And of course, if PDF::API2/Builder created the corruption in the first place, it would have to be patched! 121911 was just a red herring ("scarecrow"?) in having a similar line number? I was thrown off the trail by your mentioning it. If I read you correctly, you feel that your patch for this (131147) bug is more general than Klaus's patch, and there is no need to also apply his patch (harmless but now superfluous). Is that correct, or should his patch also be applied for some reason? I should probably copy-paste your explanation into the documentation somewhere, while I'm at it. Finally, thank you to both of you for making the effort to track this down and create patches. Good job!
Show quoted text
> If I read you correctly, you feel that your patch for this (131147) bug is more general than Klaus's patch, and there is no need to also apply his patch
Correct, mine can possibly address other issues (unknown yet), and it's either of the two, but if you decide to choose more cautious way (Klaus's patch) rather than more radical, then _also_ search where to (cautiously) patch so that "page" call (2nd one-liner) won't fail. As to documentation, maybe a link to this RT in a comment in source is more than enough. Show quoted text
> ...about 130722, as both bugs were referencing 'find_prop' in their error messages. It sounds like your opinion is that they are completely unrelated
They are related in a sense that "PDF::API2::Basic::PDF::Pages-> find_prop" method is called on object of unsuitable class -- PDF::API2::Basic::PDF::Objind (130722) or PDF::API2::Basic::PDF::Null (131147). But objects became blessed into these classes for unrelated reasons. When PDF is broken (130722) we can't exclude possibility of similar mismatch anywhere. In fact, it's a good thing that PDF::API2 fails early and noisily. Acrobat doesn't report any issue, but when I try to import a new page in UI (it was some time ago, IIRC insert one before the last... doesn't matter) then there were bizarre results in both visual feedback ("Pages" docker) and cryptic error message. Though at first file was opened as valid! Returning to 131147, I think with my patch, PDF::API2::Basic::PDF::Null is prevented from appearing in pages tree when bubbling up from a leaf to root. As to my previous concerns (is suggested patch too radical, can it break anything)... After some thought, I can imagine only some weird steganography would be disrupted, if any is ever used in PDF :) Otherwise, should be safe.
Subject: Re: [rt.cpan.org #131147] Bug with missing properties including patch
Date: Thu, 19 Dec 2019 08:47:37 +0100
To: Vadim Repin via RT <bug-PDF-API2 [...] rt.cpan.org>
From: Klaus Ethgen <Klaus [...] Ethgen.ch>
Hi, Am Do den 19. Dez 2019 um 0:02 schrieb Vadim Repin via RT: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=131147 > >
> > If I read you correctly, you feel that your patch for this (131147) bug is more general than Klaus's patch, and there is no need to also apply his patch
> > Correct, mine can possibly address other issues (unknown yet), and it's either of the two, but if you decide to choose more cautious way (Klaus's patch) rather than more radical, then _also_ search where to (cautiously) patch so that "page" call (2nd one-liner) won't fail. As to documentation, maybe a link to this RT in a comment in source is more than enough.
I have to say, preventing the Null value in the first place sounds reasonable for me. But as Vadim told, there might still be situations where the Null happens. So another, maybe supplement patch would be to patch PDF::API2::Basic::PDF::Null class to have NULL getters for several methods (maybe even done with autoload). That would mean to return always undef (or the class itself, depending on the logic). You can even implement a warning here and a configurable switch to croak instead of carp when that happens. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16 Klaus Ethgen <Klaus@Ethgen.ch> Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C
Download signature.asc
application/pgp-signature 688b

Message body not shown because it is not plain text.

Agreed with Vadim that dictionary entries with null values should be ignored, per the spec. I've updated the code accordingly. As Vadim noted, neither his patch nor my version will handle a dictionary with an indirect object pointing to a null value, but I'm okay with waiting for that particular case to come up, hopefully with an associated patch. Thanks for the bug report and the patch.