On Mon, Jun 08, 2009 at 05:23:02AM -0400, Mark Overmeer via RT wrote:
Show quoted text> <URL:
http://rt.cpan.org/Ticket/Display.html?id=41661 >
>
> * Florian via RT (bug-Mail-Box@rt.cpan.org) [090608 07:54]:
> > Queue: Mail-Box
> > Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=41661 >
> >
> > $ perl showsubj.pl example.mbox
> > original subj is 'one =?UNKNOWN?Q?two?= three'
> > studied subj is 'one two_three'
> > decoded subj is 'one two_three'
> > ^^^
> >
> > What I think shouldn't be there is the underline ('_') character between
> > 'two' and 'three', where the original had a space. This is only an issue
> > for 'UNKNOWN' encodings, and not fixed by using up-to-date Encode 2.33
>
> Should we declare that a bug in Encode?
> The decoding (the understanding whether it is a known charset) is left
> to Encode. Apparently, when it does understand the charset, then it
> decodes and replaces the '_'. When it does not understand the charset,
> it forgets to translate the '_'. Am I right?
I was under the impression that in cases where Encode does not
understand the charset, it wouldn't touch the field at all, and that
it's a feature of Mail::Box that you were so kind to add after my
pestering earlier in this ticket to simply strip the decoding marks in
such cases.
The bug would then be that decode() in Mail::Message::Field::Full adds a
'_' as a kind of workaround for a problem in Encode (or perhaps a
workaround for a problem in the regex in decode()?), which doesn't get
removed if _decoder just returns the original $encoded.
perhaps _decoder should, if Encode::find_encoding($charset) returns
false, not just return $encoded, but first do a s/_/ /g on $encoded, for
example as the first step in the following if-block? Like this:
--- Mail-Box-2.090/lib/Mail/Message/Field/Full.pm 2009-06-02 11:56:45.000000000 +0200
+++ perl/share/perl/5.8.8/Mail/Message/Field/Full.pm 2009-06-08 13:49:11.265069004 +0200
@@ -270,11 +270,14 @@
sub _decoder($$$)
{ my ($charset, $encoding, $encoded) = @_;
$charset =~ s/\*[^*]+$//; # language component not used
- my $to_utf8 = Encode::find_encoding($charset || 'us-ascii')
- or return $encoded;
+ my $to_utf8 = Encode::find_encoding($charset || 'us-ascii');
my $decoded;
- if($encoding !~ /\S/)
+ if(!$to_utf8)
+ { $encoded =~ s/_/ /g;
+ return $encoded;
+ }
+ elsif($encoding !~ /\S/)
{ $decoded = $encoded;
}
elsif(lc($encoding) eq 'q')
and then there's also the last (else) block of that if cascade...
(NB: in decode(), it looks like you want to say 'if($is_text)' instead
of 'if(defined...', for why else would you define $is_text in the
previous line?)
@@ -298,7 +301,7 @@
sub decode($@)
{ my ($self, $encoded, %args) = @_;
my $is_text = defined $args{is_text} ? $args{is_text} : 1;
- if(defined $args{is_text} ? $args{is_text} : 1)
+ if($is_text)
{ # in text, blanks between encoding must be removed, but otherwise kept :(
# little trick to get this done: add an explicit blank.
$encoded =~ s/\?\=\s(?!\s*\=\?|$)/_?= /gs;
best,
Florian