Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 20301
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: a_kunysz [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 2.065
Fixed in: (no value)



Subject: Mail::Message::Body::Encode: charset conversion is not implemented
The charset conversion is not implemeted in Mail::Message::Body::Encode. Here is a small patch that use Text::Iconv to implement it. It doesn't seem to work for all messages (I might investigate this later) and error handling is inadequate but it's still better than nothing.
Subject: Encode.patch
diff -urp Body.old/Encode.pm Body/Encode.pm --- Body.old/Encode.pm 2006-07-05 16:17:16.544229952 +0200 +++ Body/Encode.pm 2006-07-05 16:27:01.960233192 +0200 @@ -11,6 +11,7 @@ use Carp; use MIME::Types; use File::Basename 'basename'; +use Text::Iconv; my MIME::Types $mime_types; @@ -39,14 +40,15 @@ sub encode(@) # my $mime_to = lc $type_to; # If possible, update unify() too. -# my $char_was = $type_from->attribute('charset'); -# my $char_to = $type_to->attribute('charset'); + my $char_was = lc $type_from->attribute('charset'); + my $char_to = lc $type_to->attribute('charset'); my $trans_was = lc $self->transferEncoding; my $trans_to = lc $transfer; # - # The only translations implemented now is content transfer encoding. + # The only translations implemented now are content transfer encoding + # and charset. # #warn "Translate ($trans_was) -> ($trans_to)\n"; @@ -64,6 +66,16 @@ sub encode(@) return $self; } + # charset conversion + if (defined $decoded and $char_was ne $char_to) { + my $iconv = Text::Iconv->new($char_was, $char_to); + my $converted = $iconv->convert($decoded); + if (defined $converted) { + $decoded = $converted; + } + # else just ignore error and keep it unconverted + } + my $encoded; if($trans_to eq 'none') {$encoded = $decoded} elsif(my $encoder = $self->getTransferEncHandler($trans_to)) diff -urp Body.old/Encode.pod Body/Encode.pod --- Body.old/Encode.pod 2006-07-05 16:17:16.000000000 +0200 +++ Body/Encode.pod 2006-07-05 16:17:53.000000000 +0200 @@ -50,7 +50,8 @@ NOT IMPLEMENTED YET =item * charset conversion -NOT IMPLEMENTED YET +The (buggy) implementation use Text::Iconv. For some reasons, +conversion will not always work. If it fails, it will keep the original text. =back
On Wed Jul 05 10:38:52 2006, guest wrote: Show quoted text
> The charset conversion is not implemeted in Mail::Message::Body::Encode. > Here is a small patch that use Text::Iconv to implement it. It doesn't > seem to work for all messages (I might investigate this later) and error > handling is inadequate but it's still better than nothing.
Text::Iconv is outdated, we must use Encode's: from_to($octets, "iso-8859-1", "cp1250"); I strongly prefer something which does not work at all above an implementation which only works in some cases: MailBox is used in automated processes and therefore requires predetermined behavior. The extension with character-set conversions is quite simple... however: it requires some manual testing and some automated tests. Only if both are done, I take this patch... I am willing to translate your tests and code into MailBox standard, but have no time to collect them.
From: a_kunysz [...] yahoo.com
Le Jeu. Jul. 06 04:00:47 2006, MARKOV a écrit : Show quoted text
> Text::Iconv is outdated, we must use Encode's: > from_to($octets, "iso-8859-1", "cp1250");
I didn't knew about Encode. I rewrote the patch using from_to. It's even simpler now. However, there is a problem with messages that don't specify a charset. Mail::Message::Body considers the charset as undef but RFC1341 says it means US-ASCII (and some MUA seems to send ISO-8859-1 messages without specifing a charset). Should it be fixed at message parsing or should we "$char_was = 'us-ascii' unless defined $char_was;" before calling from_to ? The first solution seems better but I haven't looked through that part of the code (yet). Maybe I should open another bug report for that issue. Show quoted text
> The extension with character-set conversions is quite simple... however: > it requires some manual testing and some automated tests. Only if both > are done, I take this patch... I am willing to translate your tests > and code into MailBox standard, but have no time to collect them.
You mean I just have to collect potentially problematic mails and you'll turn that into test scripts ?
Show quoted text
> > are done, I take this patch... I am willing to translate your tests > > and code into MailBox standard, but have no time to collect them.
> You mean I just have to collect potentially problematic mails and you'll > turn that into test scripts ?
I only need one or two problematic situations, and a few very distictive correct conversions. Preferably in a form like this: my $body = Mail::Message::Body->new(data => ..., charset => ...); my $trans = $body->decode(charset => 'something-else'); is($trans, 'what you expect'); To collect good examples is by far the most complex part of the code development, and the only reason why I did not add it to MailBox yet. Please, can we continue the discussion in private mails? mark@overmeer.net.
Mail::box 2.083 will be released today. It does what you wished for... the implementation was quite difficult, but the result is nice.