Subject: | Encode::MIME::Header and Russian |
It seems that encode("MIME-Header",...) program incorrectly works with
Russian.
I send test message with Novell Evolution mail client.
In a letter subject there was a russian word 'тест' (test).
Evolution convert it to this:
Subject: =?UTF-8?Q?=D1=82=D0=B5=D1=81=D1=82?=
Then I use custom program (sub rfc2047conv, source attached), and get
the same good result:
=?UTF-8?Q?=D1=82=D0=B5=D1=81=D1=82?=
But if i use Encode::MIME::Header
encode("MIME-Header", 'тест')
the results looks different, and subject header show wrong in evolution
mail client.
Also I tried 'MIME-B' and 'MIME-Q' options , but without success:
orig_str=тест
encode with 'MIME-Header' =?UTF-8?B?w5HCgsOQwrXDkcKBw5HCgg==?=
encode with 'MIME-B' =?UTF-8?B?w5HCgsOQwrXDkcKBw5HCgg==?=
encode with 'MIME-Q'
=?UTF-?Q?=C3=91=C2=82=C3=90=C2=B5=C3=91=C2=81=C3=91=C2=82?=
encode with 'rfc2047conv' =?UTF-8?Q?=D1=82=D0=B5=D1=81=D1=82?=
Somebody knows, where an error ?
My OS is
# cat /etc/fedora-release
Fedora release 9 (Sulphur)
# rpm -qf
/usr/lib/perl5/5.10.0/i386-linux-thread-multi/Encode/MIME/Header.pm
perl-5.10.0-40.fc9.i386
# grep -i version
/usr/lib/perl5/5.10.0/i386-linux-thread-multi/Encode/MIME/Header.pm
our $VERSION = do { my @r = ( q$Revision: 2.5 $ =~ /\d+/g ); sprintf
"%d." . "%02d" x $#r, @r };
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
I use this script for tests:
#!/usr/bin/perl -w
use Encode qw(encode decode);
my $orig_str = 'тест';
print "orig_str=$orig_str\n";
my $res = encode("MIME-Header", $orig_str);
print "encode with MIME-Header $res \n";
$res = encode("MIME-B", $orig_str);
print "encode with MIME-B $res \n";
$res = encode("MIME-Q", $orig_str);
print "encode with MIME-Q $res \n";
$res = rfc2047conv($orig_str, 'UTF-8');
print "encode with rfc2047conv $res \n";
# rfc2047conv (string, charset, prefix size);
sub rfc2047conv{
my $str = shift;
my $charset = uc(shift);
my $init_len = shift || 0;
my $len = length($str);
return '' unless($len);
my $begin = "=?$charset?Q?";
my $res = $begin;
my $count = $init_len + length($begin);
foreach my $c (split(//, $str)) {
my ($repl, $repl_len);
if($c eq '?' || $c eq '_' || $c eq '=' || $c lt ' ' || $c gt '~') {
$repl = sprintf("=%X", ord($c));
$repl_len = 3;
} elsif($c eq ' ') {
$repl = '_';
$repl_len = 1;
} else {
$repl = $c;
$repl_len = 1;
}
if($count + $repl_len > 72) {
$res .= "?=\r\n " . $begin;
$count = 1 + length($begin);
}
$res .= $repl;
$count += $repl_len;
}
$res .= '?=';
return $res;
}