Skip Menu |

This queue is for tickets about the CAM-PDF CPAN distribution.

Report information
The Basics
Id: 14627
Status: resolved
Worked: 2 hours (120 min)
Priority: 0/
Queue: CAM-PDF

People
Owner: cpan [...] clotho.com
Requestors: rydell [...] mymail.ch
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.99
Fixed in: 1.01



Subject: infinite memory consumption?
Hi, may be it is my fault, while I'm pretty unexperienced n Perl, but I can't find any "close"-like method for CAM-PDF. Using attached script the memory consumption goes to infinite after reading (or just opening without reading) several hundreds of files. It seems to me, that memory is never deallocated after once onening a file by _new CAM::PDF("$file")_ ? using: - Perl 5.8.6 on Win32 (Activestate or Indigostar distribution) - CAM-PDF 0.99 - Win2k Thanks in advance. Berry
#!/bin/perl -w use strict; use Cwd; use CAM::PDF; my $filetype = "*.pdf"; my $dir = "d:/perlscripts/md5/test"; chdir("$dir"); my @files=getfiles(); foreach my $file (sort @files) { my $isbn_string = ""; my $isbn = &getISBNfromPDF("$dir\/$file"); if ($isbn) { print "$file: $isbn\n"; } else { print "$file: :(\n"; } } sub getfiles { return glob("$filetype"); } # endsub sub getISBNfromPDF { my $file = $_[0]; my $pdf; my $match; if ($pdf = new CAM::PDF("$file")) { my $nbrPages = $pdf->numPages(); for (my $nbr=1;$nbr<=$nbrPages;$nbr++) { my $page = $pdf->getPageText($nbr); if (($page) && ($page =~ /(I\s?S\s?B\s?N\:?\s?[0-9X\.\-\s]+)/i)) { $match = $1; last; } } } return $match; } # endsub
[guest - Sat Sep 17 08:03:06 2005]: Show quoted text
> Hi, > > may be it is my fault, while I'm pretty unexperienced n Perl, but I > can't find any "close"-like method for CAM-PDF. > Using attached script the memory consumption goes to infinite after > reading (or just opening without reading) several hundreds of > files. > It seems to me, that memory is never deallocated after once onening a > file by _new CAM::PDF("$file")_ ?
Hi Berry, You're right, there is no close method. That's intentional, as the data structure is intended to clean itself up when it goes out of scope (like when the subroutine ends). The attached script looks fine to me -- I can't see any flaws. CAM::PDF is optimized for speed and flexibility at the expense of memory usage, so I'm not too surprised that it would consume lots of RAM for big PDFs (e.g. 84MB RAM for the 14MB Adobe PDF Reference document on my machine). However, with the script you attached memory usage should be independent of the number of PDF files since "my $pdf" goes out of scope every loop. Is there one PDF in particular that is causing the script to go haywire? If so, would it be feasible to send it to me? -- Chris
Berry, I tried your test and it worked as you explained on Perl 5.6.0 and 5.8.6. I believe that the library is creating some internal references that are not being garbage collected. I'll try to ferret those out. Thanks for the assistance! Chris
Berry, I just uploaded CAM::PDF v1.01 which solves your problem. There were three circular references in the data structure which prevented $pdf instances from being garbage collected, and thus driving up the memory consumption. Those cycles are now broken and your simpletest1.pl only takes 8 MB of RAM to run instead of 110 MB. Thanks very much for identifying the problem! Chris