Subject: | save() very slow for large files with many changes |
CAM::PDF version 1.54
Perl 5.10.1
Ubuntu 10.04 LTS
I have a script which makes multiple additions to (guessing) every third
page in a document with nearly 700 pages, totalling 17MB on disk. Sorry,
I can't provide a test file as this is confidential customer information.
Making the changes was quick (as I've come to expect from CAM::PDF) but
saving the document using preserveOrder() and cleanoutput() puzzlingly
took over two minutes.
Analysis with NYTProf revealed that most of the time was spent on line
5003 in PDF.pm.
$newxref{$key} = length $self->{content};
In my case, that was 139 seconds.
This gets executed each time a changed object is written out. I have no
idea about Perl internals but it seems to me that something odd is
lurking behind length(). Either it incurs a large-ish static penalty or,
even more oddly, takes longer to find the length of longer strings.
I patched the loop to keep a separate offset counter and only use
length() to find the size of the new objects. That works fine here;
length() now only consumes 3.51ms and the complete save() 1.83s, which
is just fine, so I didn't make any further attempts at optimisation.
I've attached my patch; hope this helps.
BTW: Thanks for CAM::PDF. I've tried Text::PDF, PDF::API2, PDF::Reuse
and even a demo version of PDFlib, but I've settled on this module
because it represents a good combination of speed, ability to get the
job done, and openness. It hasn't let me down yet and it looks like it's
being actively supported, too. Well done.
-- David
Subject: | CAM-PDF_save_patch.diff |
4999a5000
> my $offset = length $self->{content};
5003c5004
< $newxref{$key} = length $self->{content};
---
> $newxref{$key} = $offset;
5006c5007,5009
< $self->{content} .= $self->writeObject($key);
---
> my $obj = $self->writeObject($key);
> $self->{content} .= $obj;
> $offset += length $obj;