Skip Menu |

This queue is for tickets about the PDF-Extract CPAN distribution.

Report information
The Basics
Id: 33707
Status: open
Priority: 0/
Queue: PDF-Extract

People
Owner: nsharrok [...] lgmedia.com.au
Requestors: dario.santini [...] pcj.it
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 1.01
  • 1.02
  • 2.01
  • 2.02
  • 2.04
  • 2.05
  • 2.06
  • 3.01
Fixed in: (no value)



Subject: Bug Roport
Date: Fri, 29 Feb 2008 13:33:48 +0100
To: bug-PDF-Extract [...] rt.cpan.org
From: Dario Santini <dario.santini [...] pcj.it>
Hello all, i think its a bug here detils: Cmd output: ------------ D:\PROJECTS\bpsrtk>perl split.pl F24_071116_ottobre.pdf Errore : Can't find object 0 0 obj at PDF/Extract.pm line 655 split.pl code ------------ use PDF::Extract; $pdf=new PDF::Extract( PDFDoc=> "$ARGV[0]" ); $i=1; $pdf->savePDFExtract( PDFPages=> '1' ) ; print "Errore : ",$pdf->getVars("PDFError"); I can't send all the pdf file, i follow you can fin some pices, and be free ask me more obj . Best regards Dario ------ %PDF-1.4 %âãÏÓ [....] 1 0 obj <</Kids [3 0 R 17 0 R 31 0 R 45 0 R 59 0 R 73 0 R 87 0 R 101 0 R 115 0 R 129 0 R 143 0 R 157 0 R 171 0 R] /Count 13 /Type /Pages Show quoted text
>>
endobj 184 0 obj <</Pages 1 0 R /Type /Catalog Show quoted text
>>
endobj 185 0 obj <</Creator (pdftk 0.91) /Producer (itext-paulo \(lowagie.com\)[JDK1.1] - build 126) /ModDate (D:20071114121856-01'00') /CreationDate (D:20071114121856-01'00') Show quoted text
>>
endobj xref 0 186 0000000000 65535 f 0000890696 00000 n 0000000000 00000 n 0000068134 00000 n 0000000015 00000 n 0000031947 00000 n 0000037019 00000 n 0000037118 00000 n 0000037465 00000 n 0000037212 00000 n 0000038775 00000 n 0000038526 00000 n 0000039833 00000 n 0000046855 00000 n 0000046973 00000 n 0000047306 00000 n 0000000000 00000 n 0000136630 00000 n 0000068504 00000 n 0000100437 00000 n 0000105510 00000 n 0000105610 00000 n 0000105959 00000 n 0000105705 00000 n 0000107271 00000 n 0000107022 00000 n 0000108329 00000 n 0000115351 00000 n 0000115469 00000 n 0000115802 00000 n 0000000000 00000 n 0000205133 00000 n 0000137007 00000 n 0000168940 00000 n 0000174013 00000 n 0000174113 00000 n 0000174462 00000 n 0000174208 00000 n 0000175774 00000 n 0000175525 00000 n 0000176832 00000 n 0000183854 00000 n 0000183972 00000 n 0000184305 00000 n 0000000000 00000 n 0000273636 00000 n 0000205510 00000 n 0000237443 00000 n 0000242516 00000 n 0000242616 00000 n 0000242965 00000 n 0000242711 00000 n 0000244277 00000 n 0000244028 00000 n 0000245335 00000 n 0000252357 00000 n 0000252475 00000 n 0000252808 00000 n 0000000000 00000 n 0000342139 00000 n 0000274013 00000 n 0000305946 00000 n 0000311019 00000 n 0000311119 00000 n 0000311468 00000 n 0000311214 00000 n 0000312780 00000 n 0000312531 00000 n 0000313838 00000 n 0000320860 00000 n 0000320978 00000 n 0000321311 00000 n 0000000000 00000 n 0000410642 00000 n 0000342516 00000 n 0000374449 00000 n 0000379522 00000 n 0000379622 00000 n 0000379971 00000 n 0000379717 00000 n 0000381283 00000 n 0000381034 00000 n 0000382341 00000 n 0000389363 00000 n 0000389481 00000 n 0000389814 00000 n 0000000000 00000 n 0000479145 00000 n 0000411019 00000 n 0000442952 00000 n 0000448025 00000 n 0000448125 00000 n 0000448474 00000 n 0000448220 00000 n 0000449786 00000 n 0000449537 00000 n 0000450844 00000 n 0000457866 00000 n 0000457984 00000 n 0000458317 00000 n 0000000000 00000 n 0000547662 00000 n 0000479522 00000 n 0000511456 00000 n 0000516530 00000 n 0000516631 00000 n 0000516982 00000 n 0000516727 00000 n 0000518297 00000 n 0000518047 00000 n 0000519357 00000 n 0000526380 00000 n 0000526499 00000 n 0000526833 00000 n 0000000000 00000 n 0000616191 00000 n 0000548051 00000 n 0000579985 00000 n 0000585059 00000 n 0000585160 00000 n 0000585511 00000 n 0000585256 00000 n 0000586826 00000 n 0000586576 00000 n 0000587886 00000 n 0000594909 00000 n 0000595028 00000 n 0000595362 00000 n 0000000000 00000 n 0000684720 00000 n 0000616580 00000 n 0000648514 00000 n 0000653588 00000 n 0000653689 00000 n 0000654040 00000 n 0000653785 00000 n 0000655355 00000 n 0000655105 00000 n 0000656415 00000 n 0000663438 00000 n 0000663557 00000 n 0000663891 00000 n 0000000000 00000 n 0000753249 00000 n 0000685109 00000 n 0000717043 00000 n 0000722117 00000 n 0000722218 00000 n 0000722569 00000 n 0000722314 00000 n 0000723884 00000 n 0000723634 00000 n 0000724944 00000 n 0000731967 00000 n 0000732086 00000 n 0000732420 00000 n 0000000000 00000 n 0000821778 00000 n 0000753638 00000 n 0000785572 00000 n 0000790646 00000 n 0000790747 00000 n 0000791098 00000 n 0000790843 00000 n 0000792413 00000 n 0000792163 00000 n 0000793473 00000 n 0000800496 00000 n 0000800615 00000 n 0000800949 00000 n 0000000000 00000 n 0000890307 00000 n 0000822167 00000 n 0000854101 00000 n 0000859175 00000 n 0000859276 00000 n 0000859627 00000 n 0000859372 00000 n 0000860942 00000 n 0000860692 00000 n 0000862002 00000 n 0000869025 00000 n 0000869144 00000 n 0000869478 00000 n 0000890843 00000 n 0000890893 00000 n trailer <</Info 185 0 R /Root 184 0 R /Size 186 /ID [<92916611d95e15e82f55f23efda3a411><92916611d95e15e82f55f23efda3a411>] Show quoted text
>>
startxref 891072 %%EOF
Subject: Re: [rt.cpan.org #33707] AutoReply: Bug Roport
Date: Fri, 29 Feb 2008 17:20:20 +0100
To: bug-PDF-Extract [...] rt.cpan.org
From: Dario Santini <dario.santini [...] pcj.it>
more detail In my PDF case there is some PS source not compressed, like following code [...] /CourSelez 12.00 Tf 0.000 Tc 0 Tw 2 Tr 0.1 w [] 0 d 0 0 0 rg 0 0 0 RG <<<<<<< ----- THE PROBLEM ^^^^ 1 0 0 1 19.74 812.14 Tm [...] there fore PDF::Extract comfuse himself. I tried the folowing solution, but is not more compatible whit other pdf. old line 653: $object[$obj]=~s/(\d+) (\d+) R/&getObj($1, $2)/ges; new line 65: $object[$obj]=~s/(\d+) (\d+) R(\W)/&getObj($1, $2).$3/ges;
From: nsharrok [...] lgmedia.com.au
Hi Dario, Can you put up the offending PDF on a website so I can download it and run some tests? Also can you tell me what application created the PDF? Thanks Noel Sharrock
Subject: Re: [rt.cpan.org #33707] Bug Roport
Date: Thu, 06 Mar 2008 12:54:06 +0100
To: bug-PDF-Extract [...] rt.cpan.org
From: Dario Santini <dario.santini [...] pcj.it>
<URL: http://rt.cpan.org/Ticket/Display.html?id=33707 > Hi Noel, sorry i cant put the PDF file, for privacy reasons. I can give you the "/Info" object: [...] XXX 0 obj <</Creator (pdftk 0.91) /Producer (itext-paulo \(lowagie.com\)[JDK1.1] - build 126) /ModDate (D:20071114121856-01'00') /CreationDate (D:20071114121856-01'00') Show quoted text
>>
endobj [...] For your test pourpose you can add to a /Page obj at the /Content array filed the object below XXX 0 obj <</Length 217 Show quoted text
>>
stream 0.00 0.00 595.00 842.00 re W n %*#@LINE1@#* BT /CourSelez 12.00 Tf 0.000 Tc 0 Tw 2 Tr 0.1 w [] 0 d 0 0 0 rg 0 0 0 RG 1 0 0 1 19.74 812.14 Tm [0 ( THIS IS FOR TEST PURPOSE )] TJ ET ensstream endobj
OK, Here's what you can do. Open the PDF in Adobe Acrobat and save it to another file. Point your split.pl script to the new file and see what happens. Also I'd like to know what software produced the original PDF.
Subject: Re: [rt.cpan.org #33707] Bug Roport
Date: Thu, 06 Mar 2008 22:45:52 +0100
To: bug-PDF-Extract [...] rt.cpan.org
From: Dario Santini <dario.santini [...] pcj.it>
Noel Sharrock via RT ha scritto: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=33707 > > > OK, > Here's what you can do. > Open the PDF in Adobe Acrobat and save it to another file. > Point your split.pl script to the new file and see what happens. >
no difference whit saved PDF. Show quoted text
> Also I'd like to know what software produced the original PDF. > >
Im not sure about the Software name, perhaps BUFFETTI It is an italian fiscal software. give me your private email i'll send the pdf d.
From: nsharrok [...] lgmedia.com.au
Since the line =~s/(\d+) (\d+) R/&getObj($1, $2 )$3/ges; Will find 0 0 R in 0 0 0 RG in the snip below --------8<----------------------- /CourSelez 12.00 Tf 0.000 Tc 0 Tw 2 Tr 0.1 w [] 0 d 0 0 0 rg 0 0 0 RG <<<<<<<<<<<<<< Adobe Acrobat writes this as "0.0 0.0 0.0 RG" which won't be included in getObj 1 0 0 1 19.74 9.75 Tm --------8<---------------------- I think it's best to just filter on the command name RG (stroking colour) and RD (annotation Rect). These are the only commands to have at least 2 digits The RI, R2L, RB, RT, RP, RC, RF, RV commands have other types of arguments. Try this sub getObj { return "" if $vars{"PDFError"}; my($obj,$instnum)=@_; unless ($obj[$obj] ) { if ($pdf=~/\D($obj $instnum obj.*?endobj\s*)/s ) { $object = $1; # return "" if $object=~/\/GoToR/; # Don't want these link objects $obj[$obj]++; $object[$obj]=$object; $instnum[$obj]=$instnum; $object[$obj]=~s/(\/Dest \[ )(\d+)( \d.*?)/&uri($1,$2,$3)/es; # Convert page dest to uri if not present $object[$obj]=~s/(\d+) (\d+) R([^GD])/&getObj($1, $2 )$3/ges; # $object[$obj]=~s/(\/Dest \[ \d+)==/$1 0/s; # Don't follow this path $object[$obj]=~s/\/Annots \[\s+\]\s+//s; # Delete empty Annots array } else { &error("Can't find object $obj $instnum obj ",__FILE__,__LINE__); } } "$obj 0 R"; }
Subject: Re: [rt.cpan.org #33707] Resolved: Bug Roport
Date: Sat, 08 Mar 2008 22:23:34 +0100
To: bug-PDF-Extract [...] rt.cpan.org
From: Dario Santini <dario.santini [...] pcj.it>
Thanks very much. D..