CC: | bug-Encode [...] rt.cpan.org |
Subject: | Re: [rt-users] Email Subject Header creating fragmented strings when decoded |
Date: | Fri, 18 Mar 2011 10:38:56 -0400 |
To: | rt-users [...] lists.bestpractical.com |
From: | Thomas Sibley <trs [...] bestpractical.com> |
On 18 Mar 2011 10:14, Lars Reimann wrote:
Show quoted text
> Hi all,
>
> the following problem is very annoying:
>
> RT Encodes Subject lines using the following concept:
>
> Original example Header
>
> Subject:
> =?UTF-8?B?W3NlcnZpY2UubWV0YXdheXMubmV0ICM2NzAyOF0gU3BlaWNoZXJwbGF0eiBF?=
> =?UTF-8?B?cmjDtmh1bmcgd2FzbWFpbjogNTAwIEdC?=
>
> The header is split into 2 parts:
>
> 1st part decoded: "[Queue Name #Ticket nubmer] First part of subject line"
> 2nd part decoded: "Second part of subject line"
>
> Completely decoded string: "[Queue Name #Ticket nubmer] First part of
> subject line"_"Second part of subject line"
>
> The underscore (_) marks an additional space character which is
> introduced into ALL emails on decoding the two UTF parts.
I think this is actually a bug in Encode::MIME::Header's
parsing/generation of the encoded header lines. I tracked it down when
it broke a test in other code. I believe it was introduced with the fix
for https://rt.cpan.org/Public/Bug/Display.html?id=40027.
I've copied this mail to the bug tracker for Encode.
Show quoted text> I double checked with decoding UTF in python. Results: When using 2 UTF
> parts, a decode introduces an additional space. When using only ONE
> UTF-string (the above subject w/o padding and UTF header) the decode is
> done correctly!
>
> If would be very glad the resolve this problem. If RT could use only one
> UTF string, the problem would go away.
> How can we do that?
If you're really, really annoyed by it, I believe you can downgrade to
an older Encode. But you'll regain other bugs that have been fixed as
well, and I can't suggest it.
Show quoted text> And: does anyone have the same problem with email clients (we use
> evolution and thunderbird, but most likely other clients are also
> affected).
>
> p.s. It's unclear to me when UTF encoding is used. Sometimes the Subject
> line is not UTF encoded and uses ASCII. Perhaps it depends on non-ASCII
> characters within the subject.
It's used when there are characters other than ascii in a mail header.
Thomas