Skip Menu |

This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id: 66456
Status: resolved
Priority: 0/
Queue: PDF-API2

People
Owner: Nobody in particular
Requestors: michiel.beijen [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.61
Fixed in: 2.020



Subject: PDFs are not searchable when text is created using ttfont
If the text in the PDF is created using ttfont; the text in the resulting PDF is not searchable. Also pdftotext on my Linux laptop does not produce output. If you create the PDF using corefont the resulting PDF is searchable, and pdftotext does create output.
On Mon Mar 07 15:39:19 2011, https://launchpad.net/~michiel-beijen wrote: Show quoted text
> If the text in the PDF is created using ttfont; the text in the > resulting PDF is not searchable. Also pdftotext on my Linux laptop does > not produce output. > > If you create the PDF using corefont the resulting PDF is searchable, > and pdftotext does create output.
This appears to be the result of a change in version 0.61 (May 2007). Since then, the text will only be searchable if the undocumented "-unicodemap" flag is set when calling ttfont (the same seems to be true of all fonts, but corefonts still work, presumably because no character map is required in those cases). A search through the mailing list didn't give any matches on that flag before 2010, but there was a discussion around then about optimizing TTF performance, so this is probably related. One way to solve this problem would be to mention that the -unicodemap option needs to be passed in order for the text to be searchable, but that seems like it should be the default, and there should instead be an option to leave it out for performance or file-size reasons. How significant is the performance/size impact, I wonder? If we had support for embedding partial fonts, I bet it would be much less significant, especially for larger fonts (e.g. Unicode fonts).
I've updated ttfont to turn on -unicodemap if it isn't specifically set, so text will now be searchable by default (the expected behavior) as of the next release.
Long, long overdue, but "the next release" just happened.