Subject: | TTFont0 generates broken CMap (patch incl.) |
When the ToUnicode option is supplied, Text::PDF::TTFont0::new() will
generate a /ToUnicode CMap object that is broken in a few ways.
This results in PDF files whose text can't be searched, copied or extracted.
Here's the broken output:
1 stream
2 /CIDInit /ProcSet findresource being 12 dict begin begincmap
3 /CIDSystemInfo << /Registry (BXCJIM+ArialRegular+0) /Ordering (XYZ)
4 /Supplement 0 >> def
5 /CMapName /BXCJIM+ArialRegular+0 def
6 1 begincodespacerange <0001> <0d57> endcodespacerange
7 <0001> <0001> <0000>
8
9 ...
10
11 <0d57> <0d57> <20b8>
12 endbfrange
13 endcmap CMapName currendict /CMap defineresource pop end endendstream
Notes:
Line 5: " /CMapType 2 def" missing
Line 7: "NNN beginbfrange" missing
Line 13: "endendstream" missing newline between "end" and "endstream"
Best regards,
Thomas
--- c:/usr/local/perl5/site/lib/Text/PDF/TTFont0.pm 2017-11-08 13:11:45.729893000 +0100
+++ c:/temp/Text-PDF-0.31/lib/Text/PDF/TTFont0.pm 2016-08-04 18:49:53.000000000 +0200
@@ -109,11 +109,9 @@
$unistr = '/CIDInit /ProcSet findresource being 12 dict begin begincmap
/CIDSystemInfo << /Registry (' . $self->{'BaseFont'}->val . '+0) /Ordering (XYZ)
/Supplement 0 >> def
-/CMapName /' . $self->{'BaseFont'}->val . '+0 def /CMapType 2 def
+/CMapName /' . $self->{'BaseFont'}->val . '+0 def
1 begincodespacerange <';
$unistr .= sprintf("%04x> <%04x> endcodespacerange\n", 1, $num - 1);
- $unistr .= $num - $i > 100 ? 100 : $num - $i;
- $unistr .= " beginbfrange\n";
for ($i = 1; $i < $num; $i++)
{
if ($i % 100 == 0)
@@ -124,7 +122,7 @@
}
$unistr .= sprintf("<%04x> <%04x> <%04x>\n", $i, $i, $rev[$i]);
}
- $unistr .= "endbfrange\nendcmap CMapName currendict /CMap defineresource pop end end\n";
+ $unistr .= "endbfrange\nendcmap CMapName currendict /CMap defineresource pop end end";
$touni = PDFDict();
$parent->new_obj($touni);
$touni->{' stream'} = $unistr;