Subject: | $pdf->info() metadata improperly handles Unicode strings |
Hi,
A Debian user reported the following to our bug reporting system. I am
quoting the report. Please tell me if the attached patch looks OK to
you. I can include it in the Debian package so you don't have to make
new PDF::API2 release just for it.
For reference, the Debian bug report is available at
http://bugs.debian.org/461167
Thanks in advance,
dam
Debian Perl Group
------------------------------------8<-----------------------------
PDF::API2 includes Encode but doesn't use it when checking for UTF-16
strings in metadata ($pdf->info() hash). This causes the output to be
garbled in our UTF-8 (perl -CSD) environment. The attached patch uses
Encode::detect to decode UTF-16BE/LE strings; it might be useful to
simply Encode::Guess the values but this approach is more conservative.
Subject: | fix-UTF16-detection.patch |
--- a/lib/PDF/API2.pm 2006-10-04 16:55:53.000000000 -0700
+++ b/lib/PDF/API2.pm 2008-01-16 17:23:05.000000000 -0800
@@ -590,12 +590,8 @@
foreach my $k (@{$self->{infoMeta}}) {
next unless(defined $self->{pdf}->{'Info'}->{$k});
$opt{$k}=$self->{pdf}->{'Info'}->{$k}->val;
- if(unpack('n',$opt{$k})==0xfffe) {
- my ($mark,@c)=unpack('n*',$opt{$k});
- $opt{$k}=pack('U*',@c);
- } elsif(unpack('n',$opt{$k})==0xfeff) {
- my ($mark,@c)=unpack('v*',$opt{$k});
- $opt{$k}=pack('U*',@c);
+ if ((unpack('n',$opt{$k})==0xfffe) or (unpack('n',$opt{$k})==0xfeff)) {
+ $opt{$k} = decode('UTF-16', $self->{pdf}->{'Info'}->{$k}->val);
}
}
}