Subject: | Can’t handle newlines in references |
In PDF syntax, an indirect reference consists of three distinct tokens that can be separated by any PDF whitespace, and even comments. For example, this is a syntactically valid indirect reference:
1 %eieio
0
R
Text::PDF does not allow comments at all (based on reading the code; that is not a problem for my PDFs). But it does choke on newlines if the object is long enough that it has not all been read into the file yet.
This happens with:
1895 0
obj<</Count
253/Kids[1896
0
R
1
0
R
7
0
R
13
0
R
...
etc., with 253 entries.
Text::PDF::File::readval needs to read more data if it finds what could be a partial reference.
Subject: | open_7avjz48f.txt |
--- /Users/sprout/.cpan/build/Text-PDF-0.31-rH_fyS/lib/Text/PDF/File.pm 2016-08-16 08:01:48.000000000 -0700
+++ lib/Text/PDF/File.pm 2017-02-26 14:54:42.000000000 -0800
@@ -1080,10 +1080,10 @@
{ $xlist->{$xmin++} = [$1, $2, $3]; }
}
- if ($buf !~ /^trailer$cr/oi)
+ if ($buf !~ /^trailer$ws_char*/oi)
{ die "Malformed trailer in PDF file $self->{' fname'} at " . ($fh->tell - length($buf)); }
- $buf =~ s/^trailer$cr//oi;
+ $buf =~ s/^trailer$ws_char*//oi;
($tdict, $buf) = $self->readval($buf);
$tdict->{' loc'} = $xpos;