Bug #40648 for PDF-API2: Unicode text prints text on top of text before it

Tue Nov 04 03:37:25 2008 BARTL [...] cpan.org - Ticket created

Subject:

Unicode text prints text on top of text before it

Here's a demonstration of the bug: the test script (bug-demo.pl) produces a PDF file with the Polish text (part of an address) "Centrum Uslug Ksiegowych", with modifications on the "l", and on the "e" in the third word. Look at bug-demo.release.pdf for the PDF with the release version of PDF::API2 (0.71.001); look at bug-demo.patched.pdf for the PDF after my patch is applied. #!/usr/bin/perl -w use PDF::API2; use strict; gen_pdf("$0.pdf"); sub gen_pdf { my($save_as) = @_; my $api = PDF::API2->new(); my $uf = unifont($api, 'Times', 1); $api->mediabox(595,842); my $page = $api->page; my $text = $page->text; $text->font( $uf, 18 ); $text->translate( 190, 400 ); $text->paragraph("Centrum Us\x{0142}ug Ksi\x{0119}gowych", 220, 25); $api->saveas($save_as); $api->end; } sub unifont { my($api, $fontname, @blk) = @_; return $api->unifont( $api->corefont($fontname, -encode=>'latin1'), map([ $api->corefont($fontname, -encode=>"uni$_"), [$_] ], @blk ), -encode => 'latin1' ); } The patch (PDF-API2-Resource-Font.pm.patch) is to add one line in the file PDF/API2/Resource/Font.pm $data->{firstchar} = 0; to set this value to zero if $encoding matches /^uni\d+$/. You can also simply replace the existing module file with the one I attached. (PDF-API2-Resource-Font.pm.tar.gz) (for PDF:API2 0.71.001). Some background: PDF::API2::Resource::UniFont uses a faked font for character sets with more than 256 characters (actually 224, when ignoring control characters). It works by mapping blocks of 256 bytes in Unicode ("block", "page", "plane") to a single byte font that contains just the characters in the font for this block. For example, the Unicode range 0x100 to 0x1FF is remapped to the single byte range 0x00 to 0xFF, in the pseudo-font associated with block 1. The problem is that for the first 32 characters in these blocks, the print width is not stored, and as a result, the PDF rendering engine treats the widths for these characters as zero. That is the case for the "e" ("e ogonek"), which is chr(281) in Unicode and gets remapped to chr(25) in the single byte font, and which (as 25 < 32) gets a zero width. That's why the following "g" is printed on top of it. The "l" ("l slash") is chr(322) and gets remapped to a chr(66), so it behaves normal, as it has its proper width stored. The patch simply tells PDF::API2 that for these remapped fonts, it should treat *every* character for all character codes from 0 to 255, as a normal character, instead of just the default limited range 32 to 255. As a result, the *complete* character width table, with 256 entries, gets now stored in the PDF file. And that fixes it.

Subject:

bug-demo.release.pdf

Download bug-demo.release.pdf
application/x-pdf 7.6k

Message body not shown because it is not plain text.

Subject:

PDF-API2-Resource-Font.pm.tar.gz

Download PDF-API2-Resource-Font.pm.tar.gz
application/x-gzip 3k

Message body not shown because it is not plain text.

Subject:

PDF-API2-Resource-Font.pm.patch

--- old/PDF/API2/Resource/Font.pm Sat Mar 10 14:05:42 2007 +++ PDF/API2/Resource/Font.pm Fri Oct 31 13:48:28 2008 @@ -73,6 +73,7 @@ my $blk=$1; $data->{e2u}=[ map { $blk*256+$_ } (0..255) ]; $data->{e2n}=[ map { nameByUni($_) || '.notdef' } @{$data->{e2u}} ]; + $data->{firstchar} = 0; } elsif(defined $encoding) {

Subject:

bug-demo.pl

#!/usr/bin/perl -w use PDF::API2; use strict; gen_pdf("$0.pdf"); sub gen_pdf { my($save_as) = @_; my $api = PDF::API2->new(); my $uf = unifont($api, 'Times', 1); $api->mediabox(595,842); my $page = $api->page; my $text = $page->text; $text->font( $uf, 18 ); $text->translate( 190, 400 ); $text->paragraph("Centrum Us\x{0142}ug Ksi\x{0119}gowych", 220, 25); $api->saveas($save_as); $api->end; } sub unifont { my($api, $fontname, @blk) = @_; return $api->unifont( $api->corefont($fontname, -encode=>'latin1'), map([ $api->corefont($fontname, -encode=>"uni$_"), [$_] ], @blk ), -encode => 'latin1' ); }

Subject:

bug-demo.patched.pdf

Download bug-demo.patched.pdf
application/x-pdf 7.7k

Message body not shown because it is not plain text.

Tue Nov 18 17:43:22 2008 alfredreibenschuh [...] gmx.net - Correspondence added

try 0.72 and report again

Tue Nov 18 17:43:23 2008 The RT System itself - Status changed from 'new' to 'open'

Tue Nov 18 17:43:23 2008 alfredreibenschuh [...] gmx.net - Status changed from 'open' to 'rejected'

Thu Nov 20 15:12:49 2008 BARTL [...] cpan.org - Correspondence added

On Tue Nov 18 17:43:22 2008, AREIBENS wrote: Show quoted text

> try 0.72 and report again

Yes, it's fixed now. You do have another problem: PDF:::API2 doesn't show up as the latest release in the CPAN index at http://www.cpan.org/modules/02packages.details.txt.gz which still lists 0.71.001 as the most recent version. Most likely you're suffering from the "world writable directories" problem as discussed at http://use.perl.org/~cosimo/journal/37554 and with a possible fix from Windows at http://use.perl.org/~Burak/journal/37599 . I check the archive on Linux, and all files and directories are indeed world writable. p.s. What's that about bug status "rejected"? I submit a bug report, a new version of PDF::API2 comes out a day after my bug report, with the exact same fix as I proposed, and then you reject my bug report??

Thu Nov 20 15:12:52 2008 The RT System itself - Status changed from 'rejected' to 'open'

Thu Nov 20 15:14:02 2008 BARTL [...] cpan.org - Status changed from 'open' to 'resolved'

Thu Nov 20 15:14:02 2008 BARTL [...] cpan.org - Fixed in 0.72 added