> Thank you so much.
>
> I've been able to get it "working" by wrapping the call to ->text() in
> an
> eval block.
> It still spits out errors to stderr, but it is generating text.
>
> Just as an FYI, I am getting the following information logged to the
> shell:
> glibc detected free(): invalid next size (normal) 0xsomethingsomething
> I wouldnt even mention it but I am redirecting stderr to file, and so
> am
> surprised to see it.
>
> Thanks for all your help.
>
> -Robert Waters
>
> On Thu, Jun 18, 2009 at 7:08 PM, leo charre via RT <bug-PDF-
> OCR2@rt.cpan.org
> > wrote:
>
> 1.13/lib/PDF/OCR2.pm#ERRORS<
http://search.cpan.org/%7Eleocharre/PDF-
> OCR2-1.13/lib/PDF/OCR2.pm#ERRORS>
> 1.13/bin/pdfocrtest
> >
> >
> >
> > The code is basically:
> >
> > my $instance;
> >
> > eval { $instance = PDF::API2->open($abs_pdf_path) } ? 'ok' : 'bad'
> >
> >
> > If you have a script, you can test the pdf before you try to do
> something
> > else to it.
> >
> > I need to write a little more on this matter.
> > As it turns out, PDF::API2 may be finicky about reading some pdfs it
> deems
> > to have bad xref tables..
> > And pdftk can "fix" tables- but then- you are altering the pdf,
> which I
> > want
> > to stay away from.
> > I hope this helps for now.
> >
> >
> >
> >
> >
> > On 6/18/09, R M Waters via RT <bug-PDF-OCR2@rt.cpan.org> wrote:
> > >
> > > Thu Jun 18 18:31:17 2009: Request 47129 was acted upon.
> > > Transaction: Ticket created by robert.waters@gmail.com
> > > Queue: PDF-OCR2
> > > Subject: ->text() fails in xpdf, entire script dies
> > > Broken in: (no value)
> > > Severity: (no value)
> > > Owner: Nobody
> > > Requestors: robert.waters@gmail.com
> > > Status: new
> > > Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=47129 >
> > >
> > >
> > > Is there a way to check for xpdf errors, rather than have the
> program die
> > > because of them?
> > > I am currently iterating through a directory of pdfs, OCRing each.
> > > I recv the following errors on a call to ->text for a certain pdf:
> > > "CCIT somethingsomething"x 2-10 (this only happens sometimes)
> > > "invalid stream" x 2
> > > "bad args for pdf images" x 1
> > >
> > > and then my loop stops and my script dies.
> > >
> > > I have enabled debugging in PDF::OCR2:Page and PDF::GetImages, but
> the
> > > problem is occurring in the xpdf library (so it seems to me).
> > > I have also set the PDF::OCR2::CHECK_PDF symbol, it doesnt help.
> > >
> > > I would love to be able to use a construct like this:
> > > foreach $file (@files){
> > > my $ocr = $pdf->text($file) or next;
> > > # OR #
> > > if (!defined $ocr) {next;}
> > > # OR #
> > > if(xpdf_test($file)) {my $ocr ... }
> > > }
> > >
> > > I am currently implementing a blacklist array (a jail for known
> > offenders),
> > > but have several thousands of pdfs to run (hopefully there are
> only a few
> > > problem documents).
> > >
> > > Thank you for the awesome libraries.
> > > Robert Waters
> > >
> > >
> > > Is there a way to check for xpdf errors, rather than have the
> program die
> > > because of them?
> > > I am currently iterating through a directory of pdfs, OCRing each.
> > > I recv the following errors on a call to ->text for a certain pdf:
> > > "CCIT somethingsomething"x 2-10 (this only happens sometimes)
> > > "invalid stream" x 2
> > > "bad args for pdf images" x 1
> > >
> > > and then my loop stops and my script dies.
> > >
> > > I have enabled debugging in PDF::OCR2:Page and PDF::GetImages, but
> the
> > > problem is occurring in the xpdf library (so it seems to me).
> > > I have also set the PDF::OCR2::CHECK_PDF symbol, it doesnt help.
> > >
> > > I would love to be able to use a construct like this:
> > > foreach $file (@files){
> > > my $ocr = $pdf->text($file) or next;
> > > # OR #
> > > if (!defined $ocr) {next;}
> > > # OR #
> > > if(xpdf_test($file)) {my $ocr ... }
> > > }
> > >
> > > I am currently implementing a blacklist array (a jail for known
> > offenders),
> > > but have several thousands of pdfs to run (hopefully there are
> only a few
> > > problem documents).
> > >
> > > Thank you for the awesome libraries.
> > > Robert Waters
> > >
> > >
> >
> >
> > --
> > Leo Charre
> >
> >
> > Yes. It is to run it via PDF::API2, it might have to be caught by an
> > eval...
> >
> >
http://search.cpan.org/~leocharre/PDF-OCR2-
> 1.13/lib/PDF/OCR2.pm#ERRORS<
http://search.cpan.org/%7Eleocharre/PDF-
> OCR2-1.13/lib/PDF/OCR2.pm#ERRORS>
> 1.13/bin/pdfocrtest
> >
> >
> >
> > The code is basically:
> >
> > my $instance;
> >
> > eval { $instance = PDF::API2->open($abs_pdf_path) } ? 'ok' : 'bad'
> >
> >
> > If you have a script, you can test the pdf before you try to do
> something
> > else to it.
> >
> > I need to write a little more on this matter.
> > As it turns out, PDF::API2 may be finicky about reading some pdfs it
> deems
> > to have bad xref tables..
> > And pdftk can "fix" tables- but then- you are altering the pdf,
> which I
> > want to stay away from.
> > I hope this helps for now.
> >
> >
> >
> >
> >
> > On 6/18/09, R M Waters via RT <bug-PDF-OCR2@rt.cpan.org> wrote:
> >>
> >> Thu Jun 18 18:31:17 2009: Request 47129 was acted upon.
> >> Transaction: Ticket created by robert.waters@gmail.com
> >> Queue: PDF-OCR2
> >> Subject: ->text() fails in xpdf, entire script dies
> >> Broken in: (no value)
> >> Severity: (no value)
> >> Owner: Nobody
> >> Requestors: robert.waters@gmail.com
> >> Status: new
> >> Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=47129 >
> >>
> >>
> >> Is there a way to check for xpdf errors, rather than have the
> program die
> >> because of them?
> >> I am currently iterating through a directory of pdfs, OCRing each.
> >> I recv the following errors on a call to ->text for a certain pdf:
> >> "CCIT somethingsomething"x 2-10 (this only happens sometimes)
> >> "invalid stream" x 2
> >> "bad args for pdf images" x 1
> >>
> >> and then my loop stops and my script dies.
> >>
> >> I have enabled debugging in PDF::OCR2:Page and PDF::GetImages, but
> the
> >> problem is occurring in the xpdf library (so it seems to me).
> >> I have also set the PDF::OCR2::CHECK_PDF symbol, it doesnt help.
> >>
> >> I would love to be able to use a construct like this:
> >> foreach $file (@files){
> >> my $ocr = $pdf->text($file) or next;
> >> # OR #
> >> if (!defined $ocr) {next;}
> >> # OR #
> >> if(xpdf_test($file)) {my $ocr ... }
> >> }
> >>
> >> I am currently implementing a blacklist array (a jail for known
> >> offenders),
> >> but have several thousands of pdfs to run (hopefully there are only
> a few
> >> problem documents).
> >>
> >> Thank you for the awesome libraries.
> >> Robert Waters
> >>
> >>
> >> Is there a way to check for xpdf errors, rather than have the
> program die
> >> because of them?
> >> I am currently iterating through a directory of pdfs, OCRing each.
> >> I recv the following errors on a call to ->text for a certain pdf:
> >> "CCIT somethingsomething"x 2-10 (this only happens sometimes)
> >> "invalid stream" x 2
> >> "bad args for pdf images" x 1
> >>
> >> and then my loop stops and my script dies.
> >>
> >> I have enabled debugging in PDF::OCR2:Page and PDF::GetImages, but
> the
> >> problem is occurring in the xpdf library (so it seems to me).
> >> I have also set the PDF::OCR2::CHECK_PDF symbol, it doesnt help.
> >>
> >> I would love to be able to use a construct like this:
> >> foreach $file (@files){
> >> my $ocr = $pdf->text($file) or next;
> >> # OR #
> >> if (!defined $ocr) {next;}
> >> # OR #
> >> if(xpdf_test($file)) {my $ocr ... }
> >> }
> >>
> >> I am currently implementing a blacklist array (a jail for known
> >> offenders), but have several thousands of pdfs to run (hopefully
> there are
> >> only a few problem documents).
> >>
> >> Thank you for the awesome libraries.
> >> Robert Waters
> >>
> >>
> >
> >
> > --
> > Leo Charre
> >
Alright, after much thought and deliberation- I released PDF::OCR2. This
version checks a pdf for this problem *by default*.
Please see
There is also a class flag/parameter to allow PDF::OCR2 to make a copy
of the file and repair the xref (in that copy, not the original).
Basically, now calling text() will not crash your program- it should
just return undef. And you'll get warnings to STDERR about why.