Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 132844
Status: open
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: jffry [...] posteo.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.037
Fixed in: (no value)



Subject: PDF::API2 corrupts certain PNMs when embedding them
#!/usr/bin/perl -w use strict; use PDF::API2; my $pdf = PDF::API2-> new(-file => 'test.pdf'); my $page = $pdf-> page; $page->mediabox(2480/300*72*5, 3506/300*72*5); my $gfx = $page->gfx; my $img = $pdf->image_pnm(shift); $gfx->image($img); $pdf->save; $pdf->end(); Run the above script with the pnm filename as an argument, the right hand side of the resulting PDF is shifted to the left. Investigating this, the corruption depends upon the geometry of the image - it is possible to crop the image to a size that is not corrupted in the PDF. e.g. the pnm after this cropping option is not corrupted: convert corrupt_no_compression.pnm -crop 790x400+500+900 corrupt_no_compression_cropped.pnm but cropping only the width, the corruption is still there, but less: convert corrupt_no_compression.pnm -crop 790x3506+500+0 corrupt_no_compression_cropped.pnm equally, cropping only the height, the corruption is also still there, in a similar magnitude to the original: convert corrupt_no_compression.pnm -crop 2480x400+0+900 corrupt_no_compression_cropped.pnm
Subject: corrupt_no_compression.pnm
Download corrupt_no_compression.pnm
image/x-portable-anymap 8.2m
corrupt_no_compression.pnm
Sending the previous mail has failed. Please contact your admin, they can find more details in the logs.
I can see a problem with "insufficient data" for the image when the PDF tries to display it. This is both PDF::API2 and PDF::Builder. I am looking at it. There are a couple odd things in your Perl script: 1) you use "shift" for the name fed to image_pnm, even though @_ is empty at this point. I can avoid that error by replacing "shift" with "$ARGV[0]". Can you explain what you are doing there? 2) normally you give the lower left corner x,y in the image() call. There's no telling for sure what it's using for x and y coordinates if you omit them. However, no improvement even if I add ",0,0" to the call. I also tried adding the dimensions ",2480,3506" but that didn't help. The PNM file itself appears to be OK -- I can display it as 2480x3506 in GIMP (grayscale 8 bit/pixel). Those are the dimensions stored in the PDF object describing the image, although the compressed stream (Flate compress) is 1047348 bytes rather than the original 8694880. Possibly something is going wrong with the compression itself -- I will have to try adding the image as UNcompressed data and see what happens. The page (media box) appears to be sufficiently large to hold the image. Let me know if you make any headway with this, as I will be looking at it from this end.
Subject: Re: [rt.cpan.org #132844] PDF::API2 corrupts certain PNMs when embedding them
Date: Fri, 19 Jun 2020 19:45:53 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Jeff <jffry [...] posteo.net>
Subject: Re: [rt.cpan.org #132844] PDF::API2 corrupts certain PNMs when embedding them
To: bug-PDF-API2 [...] rt.cpan.org
From: Jeff <jffry [...] posteo.net>
On 19/06/2020 17:39, Phil M. Perry via RT wrote: Show quoted text
Show quoted text
> There are a couple odd things in your Perl script: 1) you use "shift" for the name fed to image_pnm, even though @_ is empty at this point. I can avoid that error by replacing "shift" with "$ARGV[0]". Can you explain what you are doing there? 2) normally you give the lower left corner x,y in the image() call. There's no telling for sure what it's using for x and y coordinates if you omit them. However, no improvement even if I add ",0,0" to the call. I also tried adding the dimensions ",2480,3506" but that didn't help.
"shift" used like that is the same as $ARGV[0], as it is a shortcut for shift @ARGV. Show quoted text
> The PNM file itself appears to be OK -- I can display it as 2480x3506 in GIMP (grayscale 8 bit/pixel). Those are the dimensions stored in the PDF object describing the image, although the compressed stream (Flate compress) is 1047348 bytes rather than the original 8694880. Possibly something is going wrong with the compression itself -- I will have to try adding the image as UNcompressed data and see what happens. The page (media box) appears to be sufficiently large to hold the image. > > Let me know if you make any headway with this, as I will be looking at it from this end.
I tried out various different filters with no success. The header of the pnm says 2480x3506, which implies a filesize of 8694919, which is what the file is (on my system). I would expect the flate to compress it quite well, as there quite a bit of white there. So 8x smaller is no surprise.
Download signature.asc
application/pgp-signature 833b

Message body not shown because it is not plain text.

I think I've got it fixed (in PDF::Builder, PhilterPaper/Perl-PDF-Builder). It's two lines to add in PNM.pm (see GitHub entry for RT 132844 fix), and should be easy enough to add to PDF::API2. Let me know if the fix clears things up for you. This may be a code path that was never exercised for PNM before. The error I was getting was that there was insufficient data for the image... I'm not sure what you were seeing.
Subject: Re: [rt.cpan.org #132844] PDF::API2 corrupts certain PNMs when embedding them
Date: Sat, 20 Jun 2020 12:11:15 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Jeff <jffry [...] posteo.net>
Subject: Re: [rt.cpan.org #132844] PDF::API2 corrupts certain PNMs when embedding them
To: bug-PDF-API2 [...] rt.cpan.org
From: Jeff <jffry [...] posteo.net>
On 20/06/2020 04:38, Phil M. Perry via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=132844 > > > I think I've got it fixed (in PDF::Builder, PhilterPaper/Perl-PDF-Builder). It's two lines to add in PNM.pm (see GitHub entry for RT 132844 fix), and should be easy enough to add to PDF::API2. Let me know if the fix clears things up for you. This may be a code path that was never exercised for PNM before. The error I was getting was that there was insufficient data for the image... I'm not sure what you were seeing.
Thanks for the patch, Phil. I confirm that this fixes things for me too. For reference, I've attached the patch adapted for PDF::API2.

Message body is not shown because sender requested not to inline it.

Download signature.asc
application/pgp-signature 833b

Message body not shown because it is not plain text.

Great! The patch is in PDF::Builder; I don't know when Steve will put it into PDF::API2. Note that I have not looked at any other PNM types/subtypes, so something similar could happen with other PNM files, too. BTW, the defaulting of x and y to 0,0 in image() and form_image() wasn't really part of the PNM fix, but it doesn't hurt to put it in. Being undefined in your example code, they apparently were being treated as 0,0, but I don't like relying on such behavior. If someone used, e.g., the -e flag on Perl, that might be flagged as an error.
Hi, the specification http://netpbm.sourceforge.net/doc/pgm.html mentions (see #8) "single whitespace character (usually a newline)" between Maxval and raster data; that whitespace character was already consumed by "readppmheader"; the purpose of this line: https://metacpan.org/source/SSIMMS/PDF-API2-2.037/lib/PDF/API2/Resource/XObject/Image/PNM.pm#L134 eludes me completely; was reading a PGM ever successful with PDF::API2, and was it even tested? No similar line for PPM full-color images, as I see. Patch, as suggested, fixes the issue, but, in absence of what that line was supposed to do (such as well-known and tolerated application which produced broken PGMs, perhaps? but then what about others?), I'd simply remove it. Keep this remark documented, at least; the module has its share of issues, if whoever is going to untangle them, maybe this will alleviate the task :)
Yeah, I'm not sure just what it's supposed to be doing. What I did find was that this one read sucked in 857kB of data (well into the raster data), leaving the next read way short. The patch I gave works for THIS particular PNM file (the ONLY one I have to test with), but it shouldn't be surprising if it breaks on another file. That's not even considering other types/subtypes. Maybe it was expecting an empty or dummy line to simply eat, leaving the next read at the start of the raster data? As PNM is apparently pretty rare (I had never seen one before this ticket), I'm not inclined to put a lot of effort into fixing it at this point. If you or someone else wants to go over it with a fine-toothed comb and fix up the code, I'll be happy to incorporate it into PDF::Builder. Otherwise, if it proves to be really buggy, I don't think it would be a tragedy to remove PNM support entirely.
Subject: Re: [rt.cpan.org #132844] PDF::API2 corrupts certain PNMs when embedding them
Date: Sat, 27 Jun 2020 11:37:44 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Jeff <jffry [...] posteo.net>
Subject: Re: [rt.cpan.org #132844] PDF::API2 corrupts certain PNMs when embedding them
To: bug-PDF-API2 [...] rt.cpan.org
From: Jeff <jffry [...] posteo.net>
It's a really simple format, no compression and shouldn't be hard to fix properly: https://en.wikipedia.org/wiki/Netpbm You could cover the basics with six test images. I could create them for you, if you like. I first noticed that all was not good with PDF::API2's handling of PNM ages ago. Here is another image I found in 2009: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514380#5
Download signature.asc
application/pgp-signature 833b

Message body not shown because it is not plain text.

Well.... if it's a format that people are actually still using, and depend on, and the format is that simple, it might be worth fixing. After all, I did go and fix the BDF font support. It sounds like there are more than just six variations that would need to be handled, and tested, but if six would take care of the vast majority of users, we could at least start with that. How about all the related formats similar to .pnm? Are you looking at those too, or just PNM itself? I just don't want to go down a rabbit hole with "please add this additional format/variation", or there are all sorts of informal variations, and it never ends. Also please pay attention to all the formats/variations that PDF::API2 currently claims to support -- would any of those be dropped? If you're willing to contribute a list of vital formats/variations to support, and test cases for them, I can work with you on PDF::Builder's PNM support. I can't do anything about PDF::API2, although the code for Builder will *probably* work OK on API2 (no guarantees). If you have a GitHub account, please open a ticket on PDF::Builder (PhilterPaper/Perl-PDF-Builder), or open an account on catskilltech.com/forum. I don't want to clog up PDF::API2 discussions with PDF::Builder support.