Skip Menu |

This queue is for tickets about the CAM-PDF CPAN distribution.

Report information
The Basics
Id: 49766
Status: open
Priority: 0/
Queue: CAM-PDF

People
Owner: Nobody in particular
Requestors: ashirokov [...] ingdirect.ca
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: "Use of uninitialized value" line 2347 when processing some PDFs
Date: Wed, 16 Sep 2009 13:13:16 -0400
To: <bug-CAM-PDF [...] rt.cpan.org>
From: <ashirokov [...] ingdirect.ca>
Hello, I'm getting the following warning when processing certain PDFs with CAM::PDF v1.52 on Win32, ActiveState Perl v5.8.7.815: Use of uninitialized value in string eq at c:/Perl/site/lib/CAM/PDF.pm line 2347. The warning happens when calling the getPageText() function. The problem is that I can't provide a sample PDF file (they are internal and confidential). When I look at file Properties in Adobe Acrobat, it does not show anything that would help identify the software used to produce the files (it only shows that PDF version is 1.4). The vendor that produces the PDFs can't tell us what library their system uses. When I open the file in Acrobat and immediately save it, the resulting file becomes ~2 times smaller and is processed by CAM::PDF just fine. What else can I say about the PDF: - All security actions are allowed - There are only 3 fonts used - There is no images in it, only text and lines (there are tables) Is there a tool/debug I can run on the file to gather more info to find the bug ? Thanks, Arsen ----------------------------------------------------------------- ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. -----------------------------------------------------------------
Subject: Re: [rt.cpan.org #49766] "Use of uninitialized value" line 2347 when processing some PDFs
Date: Wed, 16 Sep 2009 21:38:36 -0500
To: bug-CAM-PDF [...] rt.cpan.org
From: Chris Dolan <chris [...] chrisdolan.net>
All I can tell from that information is that you have a page in your PDF that lacks a Type field, which is a required field according to the PDF specification. Try this command: getpdfpageobject.pl -v file.pdf 1 replacing "1" with the page number where you are encountering problems. That command will spit out metatdata about the page (size, fonts used, etc) without printing any of the PDF content, so it should be safe to send without revealing anything confidential. Send that output along and I'll see if it enlightens. Chris On Sep 16, 2009, at 12:14 PM, ashirokov@ingdirect.ca via RT wrote: Show quoted text
> Wed Sep 16 13:13:54 2009: Request 49766 was acted upon. > Transaction: Ticket created by ashirokov@ingdirect.ca > Queue: CAM-PDF > Subject: "Use of uninitialized value" line 2347 when processing > some PDFs > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: ashirokov@ingdirect.ca > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=49766 > > > > > Hello, > > I'm getting the following warning when processing certain PDFs with > CAM::PDF v1.52 on Win32, ActiveState Perl v5.8.7.815: > > Use of uninitialized value in string eq at c:/Perl/site/lib/CAM/PDF.pm > line 2347. > > The warning happens when calling the getPageText() function. > > The problem is that I can't provide a sample PDF file (they are > internal > and confidential). When I look at file Properties in Adobe Acrobat, > it > does not show anything that would help identify the software used to > produce the files (it only shows that PDF version is 1.4). The vendor > that produces the PDFs can't tell us what library their system uses. > > When I open the file in Acrobat and immediately save it, the resulting > file becomes ~2 times smaller and is processed by CAM::PDF just fine. > > What else can I say about the PDF: > - All security actions are allowed > - There are only 3 fonts used > - There is no images in it, only text and lines (there are tables) > > Is there a tool/debug I can run on the file to gather more info to > find > the bug ? > > Thanks, > Arsen > ----------------------------------------------------------------- > ATTENTION: > The information in this electronic mail message is private and > confidential, and only intended for the addressee. Should you > receive this message by mistake, you are hereby notified that > any disclosure, reproduction, distribution or use of this > message is strictly prohibited. Please inform the sender by > reply transmission and delete the message without copying or > opening it. > > Messages and attachments are scanned for all viruses known. > If this message contains password-protected attachments, the > files have NOT been scanned for viruses by the ING mail domain. > Always scan attachments before opening them. > ----------------------------------------------------------------- > > > Hello, > > I'm getting the following warning when processing certain PDFs with > CAM::PDF v1.52 on Win32, ActiveState Perl v5.8.7.815: > > Use of uninitialized value in string eq at c:/Perl/site/lib/CAM/ > PDF.pm line 2347. > > The warning happens when calling the getPageText() function. > > The problem is that I can't provide a sample PDF file (they are > internal and confidential). When I look at file Properties in Adobe > Acrobat, it does not show anything that would help identify the > software used to produce the files (it only shows that PDF version > is 1.4). The vendor that produces the PDFs can't tell us what > library their system uses. > > When I open the file in Acrobat and immediately save it, the > resulting file becomes ~2 times smaller and is processed by CAM::PDF > just fine. > > What else can I say about the PDF: > - All security actions are allowed > - There are only 3 fonts used > - There is no images in it, only text and lines (there are tables) > > Is there a tool/debug I can run on the file to gather more info to > find the bug ? > > Thanks, > Arsen > > ----------------------------------------------------------------- > ATTENTION: > The information in this electronic mail message is private and > confidential, and only intended for the addressee. Should you > receive this message by mistake, you are hereby notified that > any disclosure, reproduction, distribution or use of this > message is strictly prohibited. Please inform the sender by > reply transmission and delete the message without copying or > opening it. > > Messages and attachments are scanned for all viruses known. > If this message contains password-protected attachments, the > files have NOT been scanned for viruses by the ING mail domain. > Always scan attachments before opening them. > -----------------------------------------------------------------
Subject: RE: [rt.cpan.org #49766] "Use of uninitialized value" line 2347 when processing some PDFs
Date: Fri, 18 Sep 2009 09:43:12 -0400
To: <bug-CAM-PDF [...] rt.cpan.org>
From: <ashirokov [...] ingdirect.ca>
Hi Chris, Ran the script, below is the output. I've used the simplest .pdf I have to minimize the amount of debug. Let me know if I can run any more diagnostics on the files. A couple of notes: - The warning appears when running this script too - The warning appears on every single page in all the .pdfs I have If the files are indeed not up to PDF specs, is there a way to ignore the incompatibility ? I understand this should not prevent the content extraction as apparently Acrobat opens the files just fine. Thanks, Arsen Use of uninitialized value in string eq at c:/Perl/site/lib/CAM/PDF.pm line 2347. $page = { 'Kids' => bless( { 'gennum' => '0', 'value' => [ bless( { 'gennum' => '0', 'value' => '8', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '11', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '14', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '17', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '20', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '23', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '26', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '29', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '32', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '35', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '38', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '41', 'type' => 'reference', 'objnum' => '5' }, 'CAM::PDF::Node' ) ], 'type' => 'array', 'objnum' => '5' }, 'CAM::PDF::Node' ), 'Count' => bless( { 'gennum' => '0', 'value' => '12', 'type' => 'number', 'objnum' => '5' }, 'CAM::PDF::Node' ), 'MediaBox' => bless( { 'gennum' => '0', 'value' => [ bless( { 'gennum' => '0', 'value' => '0', 'type' => 'number', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '0', 'type' => 'number', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '612', 'type' => 'number', 'objnum' => '5' }, 'CAM::PDF::Node' ), bless( { 'gennum' => '0', 'value' => '792', 'type' => 'number', 'objnum' => '5' }, 'CAM::PDF::Node' ) ], 'type' => 'array', 'objnum' => '5' }, 'CAM::PDF::Node' ), 'Types' => bless( { 'gennum' => '0', 'value' => 'Pages', 'type' => 'label', 'objnum' => '5' }, 'CAM::PDF::Node' ) }; ----------------------------------------------------------------- ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. -----------------------------------------------------------------
Subject: Re: [rt.cpan.org #49766] "Use of uninitialized value" line 2347 when processing some PDFs
Date: Fri, 18 Sep 2009 18:49:10 -0500
To: bug-CAM-PDF [...] rt.cpan.org
From: Chris Dolan <chris [...] chrisdolan.net>
Arsen, I'm just one guy working on this project in my spare time. I can't support PDFs that do not meet the PDF specification. Adobe has a ton of programmers and QA staff to test compatibility. Your PDF has a "Types" value instead of the "Type" value required by the spec. If you run the following regular expression over your PDFs as binary content, it might fix them: s{/Types}{/Type }gs But it might break them in unexpected ways too... So, I apologize, but I'm going to reject this bug report. Chris On Sep 18, 2009, at 8:43 AM, ashirokov@ingdirect.ca via RT wrote: Show quoted text
> Queue: CAM-PDF > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=49766 > > > > Hi Chris, > > Ran the script, below is the output. I've used the simplest .pdf I > have > to minimize the amount of debug. Let me know if I can run any more > diagnostics on the files. > > A couple of notes: > - The warning appears when running this script too > - The warning appears on every single page in all the .pdfs I have > > If the files are indeed not up to PDF specs, is there a way to ignore > the incompatibility ? I understand this should not prevent the > content > extraction as apparently Acrobat opens the files just fine. > > Thanks, > Arsen > > Use of uninitialized value in string eq at c:/Perl/site/lib/CAM/PDF.pm > line 2347. > $page = { > 'Kids' => bless( { > 'gennum' => '0', > 'value' => [ > bless( { > 'gennum' => '0', > 'value' => '8', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '11', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '14', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '17', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '20', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '23', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '26', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '29', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '32', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '35', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '38', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > bless( { > 'gennum' => '0', > 'value' => '41', > 'type' => > 'reference', > 'objnum' => '5' > }, 'CAM::PDF::Node' ) > ], > 'type' => 'array', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > 'Count' => bless( { > 'gennum' => '0', > 'value' => '12', > 'type' => 'number', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > 'MediaBox' => bless( { > 'gennum' => '0', > 'value' => [ > bless( { > 'gennum' => '0', > 'value' => '0', > 'type' => > 'number', > 'objnum' => '5' > }, > 'CAM::PDF::Node' > ), > bless( { > 'gennum' => '0', > 'value' => '0', > 'type' => > 'number', > 'objnum' => '5' > }, > 'CAM::PDF::Node' > ), > bless( { > 'gennum' => '0', > 'value' => > '612', > 'type' => > 'number', > 'objnum' => '5' > }, > 'CAM::PDF::Node' > ), > bless( { > 'gennum' => '0', > 'value' => > '792', > 'type' => > 'number', > 'objnum' => '5' > }, > 'CAM::PDF::Node' > ) > ], > 'type' => 'array', > 'objnum' => '5' > }, 'CAM::PDF::Node' ), > 'Types' => bless( { > 'gennum' => '0', > 'value' => 'Pages', > 'type' => 'label', > 'objnum' => '5' > }, 'CAM::PDF::Node' ) > }; > ----------------------------------------------------------------- > ATTENTION: > The information in this electronic mail message is private and > confidential, and only intended for the addressee. Should you > receive this message by mistake, you are hereby notified that > any disclosure, reproduction, distribution or use of this > message is strictly prohibited. Please inform the sender by > reply transmission and delete the message without copying or > opening it. > > Messages and attachments are scanned for all viruses known. > If this message contains password-protected attachments, the > files have NOT been scanned for viruses by the ING mail domain. > Always scan attachments before opening them. > ----------------------------------------------------------------- >
Subject: RE: [rt.cpan.org #49766] "Use of uninitialized value" line 2347 when processing some PDFs
Date: Mon, 21 Sep 2009 10:33:43 -0400
To: <bug-CAM-PDF [...] rt.cpan.org>
From: <ashirokov [...] ingdirect.ca>
Thanks Chris, I understand. Substituting "Types" with "Type " in the original binary file did work. I wasn't optimistic at first as both the original and the file saved by Acrobat had "/Types/Pages" string, so I thought it won't help. The only difference was that the original had a space between "/Types" and "/Pages" ("/Types /Pages"). One last question: is there a single line in the CAM::PDF library somewhere where I can change "Type" to "Types" ? I may end up creating a hacked version of it just for these files. Might be a better band aid than doing a global substitute in the binary. Thanks again, Arsen ----------------------------------------------------------------- ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. -----------------------------------------------------------------
Subject: Re: [rt.cpan.org #49766] "Use of uninitialized value" line 2347 when processing some PDFs
Date: Mon, 21 Sep 2009 18:34:03 -0500
To: bug-CAM-PDF [...] rt.cpan.org
From: Chris Dolan <chris [...] chrisdolan.net>
Sure, just search for "Type" in CAM/PDF.pm. I can't make any promises that your proposed change will work though... Chris On Sep 21, 2009, at 9:34 AM, ashirokov@ingdirect.ca via RT wrote: Show quoted text
> Queue: CAM-PDF > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=49766 > > > Thanks Chris, I understand. > > Substituting "Types" with "Type " in the original binary file did > work. > I wasn't optimistic at first as both the original and the file saved > by > Acrobat had "/Types/Pages" string, so I thought it won't help. The > only > difference was that the original had a space between "/Types" and > "/Pages" ("/Types /Pages"). > > One last question: is there a single line in the CAM::PDF library > somewhere where I can change "Type" to "Types" ? I may end up > creating a > hacked version of it just for these files. Might be a better band aid > than doing a global substitute in the binary. > > Thanks again, > Arsen