Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 66341
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: jmcgowan [...] inch.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.026



Subject: BUGs in PDF-API2/Filter.pm
Date: Wed, 2 Mar 2011 20:54:35 -0500
To: bug-PDF-API2 [...] rt.cpan.org
From: John McGowan <jmcgowan [...] inch.com>
Filter.pm is pretty bad. The Run Length Encoder does no run length encoding (wrong back reference). The Run Length Encoder does no run length decoding (misinterprets the counter byte). The Base85 encoder outputs the base85 digits in the wrong order. The Base85 decoder has problems cleaning up the end for padded data. The LZW decompressor is set for NO early-change (the default in many ADOBE files is WITH early-change). There is an infilt2 filter (early-change) not mentioned in the doc but it expects a 13 bit reset when the dictionary is full instead of a 12 bit reset. It can handle a very short early-change file but quite quickly gets out of sync. Etc. In short ... Filter.pm is pretty bad. I had sent a report and my working version before but apparently it was not noticed.
Hi John, I recently started maintaining PDF::API2 after a couple of years of the project not having a maintainer. I'm working to create a test suite covering as much of the code as possible, and updating it to be more consistent and follow some best practices (the current code was written by multiple people over many years). Show quoted text
> The Run Length Encoder does no run length encoding (wrong back > reference). > > The Run Length Encoder does no run length decoding (misinterprets > the counter byte). > > The Base85 encoder outputs the base85 digits in the wrong order. > The Base85 decoder has problems cleaning up the end for padded data. > > The LZW decompressor is set for NO early-change (the default in many > ADOBE files is WITH early-change). There is an infilt2 filter > (early-change) not mentioned in the doc but it expects a 13 bit > reset when the dictionary is full instead of a 12 bit reset. It can > handle a very short early- change file but quite quickly gets out of > sync.
Would you be willing to write some test cases demonstrating where the current code is broken, along with your fixed version? That would be tremendously helpful. If you're comfortable with Mercurial, you can get the most up to date code here: http://deefs.net/hg/pdfapi2 or http://bitbucket.org/ssimms/pdfapi2 Otherwise, attachments to this ticket will also be fine. Thanks, Steve Simms
Subject: Re: [rt.cpan.org #66341] BUGs in PDF-API2/Filter.pm
Date: Fri, 4 Mar 2011 14:52:47 -0500
To: bug-PDF-API2 [...] rt.cpan.org
From: John McGowan <jmcgowan [...] inch.com>

Message body is not shown because it is too large.

Download Filter_pm_txt.zip
application/zip 9.6k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #66341] BUGs in PDF-API2/Filter.pm
Date: Fri, 4 Mar 2011 14:59:58 -0500
To: bug-PDF-API2 [...] rt.cpan.org
From: John McGowan <jmcgowan [...] inch.com>
Re: the base85 encoder. Looking at my code when the data has to be padded: $b = unpack("N", substr($str, $i) . "\000\000\000") If only one or two nulls should be added to pad, this may make the string too long and throw off the value of $b (too large by a factor of 256^n) Well ... as I said, I am not a programmer!
Update on this issue: I rewrote the RunLengthDecode filter last night (complete with tests, see changeset 4853928), and it should be working properly now. It will be included in release 2.021. ASCII85Decode and LZWDecode haven't been touched yet.
Subject: Re: [rt.cpan.org #66341] BUGs in PDF-API2/Filter.pm
Date: Mon, 21 Jan 2013 21:06:46 -0500 (EST)
To: Steve Simms via RT <bug-PDF-API2 [...] rt.cpan.org>
From: John McGowan <jmcgowan [...] inch.com>
On Mon, 21 Jan 2013, Steve Simms via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=66341 > > > Update on this issue: > > I rewrote the RunLengthDecode filter last night (complete with tests, see > changeset 4853928), and it should be working properly now. It will be > included in release 2.021. > > ASCII85Decode and LZWDecode haven't been touched yet.
The base85 encoder puts the data in the wrong order! Apparently no one tried to used the base85 encoder and then the decoder to see if they worked. The RunLenthEncode doesn't find any runs of data to compress. The one that caught me (in examining a malicious PFF file that used LZW compression) was the LZW decompressor with early change. That's what got me to look at the file (at one time it seems that malware authors would obfuscate malicious Javascript in PDFs with chains of old filters: first dehex, then remove LZW compression (with "early change") then base 85 decode that and finally use the runlength decoder to see tha malicious code). I haven't seen that done is a while, but I don't see all the malicious PDFs out there). Have fun with it! Regards from: John McGowan | jmcgowan@inch.com [Internet Channel] --------------+-----------------------------------------------------
The ASCII85Decode filter should now encode and decode properly. If you find any exceptions, please let me know. The LZWDecode now checks for the EarlyChange parameter (default on, per the PDF spec) and has a bunch of fixes that should result in it working properly, as long as a predictor algorithm isn't being used. Both of these fixes can be found at GitHub (https://github.com/ssimms/pdfapi2) now, and will be in the upcoming 2.026 release.