On Mon Mar 07 15:39:19 2011,
https://launchpad.net/~michiel-beijen wrote:
Show quoted text> If the text in the PDF is created using ttfont; the text in the
> resulting PDF is not searchable. Also pdftotext on my Linux laptop does
> not produce output.
>
> If you create the PDF using corefont the resulting PDF is searchable,
> and pdftotext does create output.
This appears to be the result of a change in version 0.61 (May 2007).
Since then, the text will only be searchable if the undocumented
"-unicodemap" flag is set when calling ttfont (the same seems to be true
of all fonts, but corefonts still work, presumably because no character
map is required in those cases).
A search through the mailing list didn't give any matches on that flag
before 2010, but there was a discussion around then about optimizing TTF
performance, so this is probably related.
One way to solve this problem would be to mention that the -unicodemap
option needs to be passed in order for the text to be searchable, but
that seems like it should be the default, and there should instead be an
option to leave it out for performance or file-size reasons.
How significant is the performance/size impact, I wonder? If we had
support for embedding partial fonts, I bet it would be much less
significant, especially for larger fonts (e.g. Unicode fonts).