Skip Menu |

This queue is for tickets about the CAM-PDF CPAN distribution.

Report information
The Basics
Id: 101648
Status: rejected
Priority: 0/
Queue: CAM-PDF

People
Owner: Nobody in particular
Requestors: John.Pirog [...] motorolasolutions.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Occasional spaces inserted in PDF
Date: Wed, 21 Jan 2015 04:18:43 +0000
To: "bug-CAM-PDF [...] rt.cpan.org" <bug-CAM-PDF [...] rt.cpan.org>
From: John Pirog <John.Pirog [...] motorolasolutions.com>
I am using CAM::PDF to read a PDF file. I'm finding that occasionally (not randomly as it happens at the same time every time) a space is inserted in the retrieved line. I condensed the code down to a simple program to just read each PDF page and then write it to a file: my $pdf = CAM::PDF->new($in_file); my $page1 = $pdf->getPageText(2); # read the PDF page into a variable print PAGEFILE $page1; The PDF shows: Compatibility GROUP 7994, VERSION 12 UPDATED MAY 9, 2014 12:36:17 When read through CAM::PDF, it has: Compatibility GROUP 7994, VERSION 12 UP DATED MAY 9, 2014 12:36:17 A space is inserted in the word UPDATED. It does this several other times. I switched to extracting from the single page variable by line but it does the same thing Perl 5, Version 20, subversion 1, built for MSWin32-x86-multi-thread-64int CAM::PDF 1.60 Any help is appreciated. Thanks. John Pirog
The text extraction is just a heuristic. The extra space probably comes from some kerning in the source PDF, which my heuristics don't detect.