Thank you for your attention to the problem. I hope I'm finally over with worrying about 32-bit OSes, so slurping is now less of an issue. Maybe for others, too. Just a note for your list of priorities :).
Please note, however, that part of what I suggested above (passing reference instead of scalar) was safe and cheap (no new tests required) way to reduce to 50% memory footprint while reading _any_ file.
<aside>
Memory and time required to read/slurp (be it 50% or original 100%) may be tiny compared to efforts to parse and build Perl structures for complex files. E.g., using brand-new i5-7500 machine:
perl -MPDF::API2 -E "PDF::API2-> open('PDF Reference 1.7.pdf'); say time - $^T"
366
Then why care about slurping? OK, maybe new ticket should be opened to address such performance.
</aside>
While, what you are suggesting may require a bit more planning and effort. E.g., now I think that benefits of no-slurping will be void if proper incremental update isn't implemented. The current "update" should be re-written, to append to original file. + It should work with multiple updates while object persists. + Issue when updating a file with XRefStream should be addressed, too.
On Fri Aug 18 13:17:16 2017, SSIMMS wrote:
Show quoted text> On Sat Apr 02 08:00:06 2016, vadimr wrote:
> > The PDF::API2->open() reads the entire file, and then passes this
> > scalar to 'open_scalar' i.e. a copy is made.
>
> I've just spent some time looking into this. The core PDF code in
> PDF::API2::Basic should work fine if it's passed a filehandle -- I
> don't think there's any code in there that needs a copy or an in-
> memory version of the PDF.
>
> It should be possible to modify PDF::API2->open() to pass the
> filehandle to PDF::API2::Basic::PDF::File rather than slurping it into
> memory and calling open_scalar. The only other subs that may need to
> be changed are finishobjects, save, saveas, and stringify, which are
> all in PDF/API2.pm as well. I don't think any other file would need
> to be touched, so it should be a fairly straightforward patch, and
> I've just renamed a couple of variables and cleaned up a bit of code
> to make it easier to implement.
>
> My inclination is that the default/expected behavior should be to read
> the file as needed, with an option to slurp it into memory for
> performance reasons, rather than having the current behavior be the
> default.