Subject: | Support UTF-8 and other encodings of PO files |
I noticed the output of a tool using Locale::PO was doubled UTF-8
encoded. After digging into the tool I noticed the source was the
missing decoding while reading in the PO file in Locale::PO. (My
PO file is UTF-8 encoded). Simply adding
binmode(IN, ":encoding(UTF-8)");
after open solved the problem for me. But this certainly breaks all
other cases where the PO file is not UTF-8 encoded. So I found a way
to wait until the Header (msgid "") is read in, extracting the
encoding information from this header
Content-Type: text/plain; encoding=...
and use this encoding using a binmode() call on filehandle, to change
the encoding. This works fine, and as I understand the msgid "" is
always the first entry. But there are two possible problems:
1. If msgid "" is not the first entry
2. If msgstr contains UTF-8 Strings itself (e.g. Author)
I'll give it a try to enhance the module to restart parsing the file
after a different encoding was found. This solves both problems above.
Anyway, the attached patch improves the situation.
Subject: | locale-po.diff |
--- PO.pm 2010-01-28 16:18:15.000000000 +0100
+++ src/libs/gettext/lib/Locale/PO.pm 2010-01-28 16:18:59.000000000 +0100
@@ -336,6 +336,13 @@
$po->msgid_plural( $buffer{msgid_plural} ) if defined $buffer{msgid_plural};
$po->msgstr( $buffer{msgstr} ) if defined $buffer{msgstr};
$po->msgstr_n( $buffer{msgstr_n} ) if defined $buffer{msgstr_n};
+
+ if ($po->msgid eq '""') {
+ my $header = $po->msgstr;
+ $header =~ /Content-Type:.*charset\s*=\s*([^\\\s]+)($|\\n)/ms;
+ my $encoding = $1 || "UTF-8";
+ binmode(IN, ":encoding($encoding)");
+ }
# ashash
if ($ashash) {