Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 114976
Status: rejected
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: dosio [...] land.it
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Huge memory consumption in page splitting
Date: Thu, 2 Jun 2016 01:14:16 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Claudio Dosio <dosio [...] land.it>
In some cases while splitting the PDF in single pages some of the image objects therein "explode" in size. An embedded image of about 1MB in the PDF can take the entire system to use more than 8GB of ram. If needed I can provide some PDFs with these cases. Best regards Claudio -- Claudio Dosio Responsabile R&S Software LAND S.r.l. Via di Affogalasino, 40 00148 ROMA RM Telefono +39 06 657481.1 Fax +39 06 657481.264 Web http://www.land.it http://www.securepaper.it Il contenuto di questa e-mail e' rivolto unicamente alle persone cui e' indirizzato, e puo' contenere informazioni la cui riservatezza e' tutelata. Come prescritto dal Garante per la Privacy con Deliberazione n.13 dell'1.03.2007, si informa che il messaggio inoltrato non e' di natura personale e che le risposte potranno essere conosciute nell'organizzazione di appartenenza del mittente. Sono vietati la riproduzione e l'uso di questa e-mail in mancanza di autorizzazione del destinatario. Se avete ricevuto questo e-mail per errore, vogliate cortesemente contattarci immediatamente per telefono al numero +39 06 6574811, al fax numero +39 06 657481264 o per e-mail all'indirizzo info@land.it
On Wed Jun 01 19:14:36 2016, dosio@land.it wrote: Show quoted text
> In some cases while splitting the PDF in single pages some of the image > objects therein "explode" in size. An embedded image of about 1MB in the > PDF can take the entire system to use more than 8GB of ram. > > If needed I can provide some PDFs with these cases.
Yes, I'll need some sample PDFs and example code to troubleshoot this.
Subject: Re: [rt.cpan.org #114976] Huge memory consumption in page splitting
Date: Sat, 22 Oct 2016 13:46:25 +0200
To: bug-PDF-API2 [...] rt.cpan.org
From: Claudio Dosio <dosio [...] land.it>
Hello, attached is one of the PDFs that create the problem. I cannot send the other ones as they contain customer private data. The problem mainly occurs when at least one of the images in the PDF comes from a scanner or MFC device. Probably the PDFs in that case have something weird in their structure but they can be opened without problems with Acrobat Reader or similar tools, which makes it hard to explain to somebody that their PDF has problems. One of the work arounds I found is to convert the incoming PDF via pdfopt, pdftk or ghostscript but that takes time and does not always guarantee the result. What I would need if the bug cannot be easily fixed is to have some error from the PDF::API2 library that I can use to return an error condition upstream, while now what seems to happen is that the perl continues to run even if the page has not been fully split but any exec/system command that tries to run an external command returns a out of memory condition from the external command itself. Please let me know if I can be of any help for you to solve the problem. Best regards Claudio On 21/10/2016 22:26, Steve Simms via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=114976 > > > On Wed Jun 01 19:14:36 2016, dosio@land.it wrote:
>> In some cases while splitting the PDF in single pages some of the image >> objects therein "explode" in size. An embedded image of about 1MB in the >> PDF can take the entire system to use more than 8GB of ram. >> >> If needed I can provide some PDFs with these cases.
> Yes, I'll need some sample PDFs and example code to troubleshoot this. >
-- Claudio Dosio Responsabile R&S Software LAND S.r.l. Via di Affogalasino, 40 00148 ROMA RM Telefono +39 06 657481.1 Fax +39 06 657481.264 Web http://www.land.it http://www.securepaper.it Il contenuto di questa e-mail e' rivolto unicamente alle persone cui e' indirizzato, e puo' contenere informazioni la cui riservatezza e' tutelata. Come prescritto dal Garante per la Privacy con Deliberazione n.13 dell'1.03.2007, si informa che il messaggio inoltrato non e' di natura personale e che le risposte potranno essere conosciute nell'organizzazione di appartenenza del mittente. Sono vietati la riproduzione e l'uso di questa e-mail in mancanza di autorizzazione del destinatario. Se avete ricevuto questo e-mail per errore, vogliate cortesemente contattarci immediatamente per telefono al numero +39 06 6574811, al fax numero +39 06 657481264 o per e-mail all'indirizzo info@land.it
Download prova20150224.pdf
application/pdf 1.9m

Message body not shown because it is not plain text.

Can you give me some example code that demonstrates the problem, please? I just tried a simple script to import the pages individually to another PDF, and didn't run into any problems. On Sat Oct 22 07:47:27 2016, dosio@land.it wrote: Show quoted text
> Hello, > attached is one of the PDFs that create the problem. I cannot send the > other ones as they contain customer private data. The problem mainly > occurs when at least one of the images in the PDF comes from a scanner > or MFC device. Probably the PDFs in that case have something weird in > their structure but they can be opened without problems with Acrobat > Reader or similar tools, which makes it hard to explain to somebody that > their PDF has problems. > > One of the work arounds I found is to convert the incoming PDF via > pdfopt, pdftk or ghostscript but that takes time and does not always > guarantee the result. > > What I would need if the bug cannot be easily fixed is to have some > error from the PDF::API2 library that I can use to return an error > condition upstream, while now what seems to happen is that the perl > continues to run even if the page has not been fully split but any > exec/system command that tries to run an external command returns a out > of memory condition from the external command itself. > > Please let me know if I can be of any help for you to solve the problem. > > Best regards > Claudio > > On 21/10/2016 22:26, Steve Simms via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=114976 > > > > > On Wed Jun 01 19:14:36 2016, dosio@land.it wrote:
> >> In some cases while splitting the PDF in single pages some of the image > >> objects therein "explode" in size. An embedded image of about 1MB in the > >> PDF can take the entire system to use more than 8GB of ram. > >> > >> If needed I can provide some PDFs with these cases.
> > Yes, I'll need some sample PDFs and example code to troubleshoot this. > >
> >
I came across a file that was showing similar symptoms to what you reported, and fixed a few bugs. Please try out either the latest code at GitHub or developer's release 2.030_001 to see if it's fixed for you as well.
Subject: Re: [rt.cpan.org #114976] Huge memory consumption in page splitting
Date: Thu, 10 Nov 2016 20:17:49 +0100
To: bug-PDF-API2 [...] rt.cpan.org
From: Claudio Dosio <dosio [...] land.it>
The fix solves the problem on some PDF but not on all. I have tried a setup with a 1.2MB PDF on a VM that has 4 GB of ram and it cannot manage to split the document, going in out of memory. Unluckly I cannot give you the document that produces that behaviour since it contains private data. Below is the code I use to split and process the pages: $inPdf = PDF::API2->open("$tempDirname/$inPdfFilename"); $numOfPages = $inPdf->pages; while ( $pageNum <= $numOfPages ) { my $outPdf = PDF::API2->new(-file => "$tempDirname/${inPdfFilename}_${pageNum}"); my $xo = $outPdf->importPageIntoForm( $inPdf, $pageNum ); my $page = $outPdf->page; my $p7m = ''; $page->mediabox( $configParams->{'pdfPageW'}, $configParams->{'pdfPageH'} ); my $gfx = $page->gfx; $gfx->formimage( $xo, 0, 0, 1 ); $outPdf->save(); if (isToGlyph($pageNum,$configParams)) # It is actually a configuration parameter which is true { .... Does some actions on the split pages (copy, compress, etc) .... } $pageNum++; } This code actually goes in out of memory before the IF statement Many thanks for your help Claudio Il 09/11/2016 16:42, Steve Simms via RT ha scritto: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=114976 > > > I came across a file that was showing similar symptoms to what you reported, and fixed a few bugs. Please try out either the latest code at GitHub or developer's release 2.030_001 to see if it's fixed for you as well. >
-- Claudio Dosio Responsabile R&S Software LAND S.r.l. Via di Affogalasino, 40 00148 ROMA RM Telefono +39 06 657481.1 Fax +39 06 657481.264 Web http://www.land.it http://www.securepaper.it Il contenuto di questa e-mail e' rivolto unicamente alle persone cui e' indirizzato, e puo' contenere informazioni la cui riservatezza e' tutelata. Come prescritto dal Garante per la Privacy con Deliberazione n.13 dell'1.03.2007, si informa che il messaggio inoltrato non e' di natura personale e che le risposte potranno essere conosciute nell'organizzazione di appartenenza del mittente. Sono vietati la riproduzione e l'uso di questa e-mail in mancanza di autorizzazione del destinatario. Se avete ricevuto questo e-mail per errore, vogliate cortesemente contattarci immediatamente per telefono al numero +39 06 6574811, al fax numero +39 06 657481264 o per e-mail all'indirizzo info@land.it
PDF::API2 2.032 (just released) includes memory-related improvements that may help in this case. Since I'm not able to reproduce this issue, I'm going to close this ticket. Feel free to create a new one if you have a file you can share (you can also send me a file privately, which I'll use solely to track down this issue).