Bug #66713 for Encode: Re: [rt-users] Email Subject Header creating fragmented strings when decoded

Fri Mar 18 10:38:59 2011 trs [...] bestpractical.com - Ticket created

CC:	bug-Encode [...] rt.cpan.org
Subject:	Re: [rt-users] Email Subject Header creating fragmented strings when decoded
Date:	Fri, 18 Mar 2011 10:38:56 -0400
To:	rt-users [...] lists.bestpractical.com
From:	Thomas Sibley <trs [...] bestpractical.com>

On 18 Mar 2011 10:14, Lars Reimann wrote: Show quoted text

> Hi all, > > the following problem is very annoying: > > RT Encodes Subject lines using the following concept: > > Original example Header > > Subject: > =?UTF-8?B?W3NlcnZpY2UubWV0YXdheXMubmV0ICM2NzAyOF0gU3BlaWNoZXJwbGF0eiBF?= > =?UTF-8?B?cmjDtmh1bmcgd2FzbWFpbjogNTAwIEdC?= > > The header is split into 2 parts: > > 1st part decoded: "[Queue Name #Ticket nubmer] First part of subject line" > 2nd part decoded: "Second part of subject line" > > Completely decoded string: "[Queue Name #Ticket nubmer] First part of > subject line"_"Second part of subject line" > > The underscore (_) marks an additional space character which is > introduced into ALL emails on decoding the two UTF parts.

I think this is actually a bug in Encode::MIME::Header's parsing/generation of the encoded header lines. I tracked it down when it broke a test in other code. I believe it was introduced with the fix for https://rt.cpan.org/Public/Bug/Display.html?id=40027. I've copied this mail to the bug tracker for Encode. Show quoted text

> I double checked with decoding UTF in python. Results: When using 2 UTF > parts, a decode introduces an additional space. When using only ONE > UTF-string (the above subject w/o padding and UTF header) the decode is > done correctly! > > If would be very glad the resolve this problem. If RT could use only one > UTF string, the problem would go away. > How can we do that?

If you're really, really annoyed by it, I believe you can downgrade to an older Encode. But you'll regain other bugs that have been fixed as well, and I can't suggest it. Show quoted text

> And: does anyone have the same problem with email clients (we use > evolution and thunderbird, but most likely other clients are also > affected). > > p.s. It's unclear to me when UTF encoding is used. Sometimes the Subject > line is not UTF encoded and uses ASCII. Perhaps it depends on non-ASCII > characters within the subject.

It's used when there are characters other than ascii in a mail header. Thomas

Sat May 21 18:37:08 2011 DANKOGAI [...] cpan.org - Status changed from 'new' to 'open'

Fri Apr 01 15:02:19 2016 pali [...] cpan.org - Cc PALI added

Fri Apr 01 15:03:12 2016 pali [...] cpan.org - Correspondence added

On Pia mar 18 10:38:59 2011, trs@bestpractical.com wrote: Show quoted text

> On 18 Mar 2011 10:14, Lars Reimann wrote:

> > Hi all, > > > > the following problem is very annoying: > > > > RT Encodes Subject lines using the following concept: > > > > Original example Header > > > > Subject: > > =?UTF-8?B?W3NlcnZpY2UubWV0YXdheXMubmV0ICM2NzAyOF0gU3BlaWNoZXJwbGF0eiBF?= > > =?UTF-8?B?cmjDtmh1bmcgd2FzbWFpbjogNTAwIEdC?= > > > > The header is split into 2 parts: > > > > 1st part decoded: "[Queue Name #Ticket nubmer] First part of subject line" > > 2nd part decoded: "Second part of subject line" > > > > Completely decoded string: "[Queue Name #Ticket nubmer] First part of > > subject line"_"Second part of subject line" > > > > The underscore (_) marks an additional space character which is > > introduced into ALL emails on decoding the two UTF parts.

> > I think this is actually a bug in Encode::MIME::Header's > parsing/generation of the encoded header lines. I tracked it down when > it broke a test in other code. I believe it was introduced with the fix > for https://rt.cpan.org/Public/Bug/Display.html?id=40027. > > I've copied this mail to the bug tracker for Encode. >

> > I double checked with decoding UTF in python. Results: When using 2 UTF > > parts, a decode introduces an additional space. When using only ONE > > UTF-string (the above subject w/o padding and UTF header) the decode is > > done correctly! > > > > If would be very glad the resolve this problem. If RT could use only one > > UTF string, the problem would go away. > > How can we do that?

> > If you're really, really annoyed by it, I believe you can downgrade to > an older Encode. But you'll regain other bugs that have been fixed as > well, and I can't suggest it. >

> > And: does anyone have the same problem with email clients (we use > > evolution and thunderbird, but most likely other clients are also > > affected). > > > > p.s. It's unclear to me when UTF encoding is used. Sometimes the Subject > > line is not UTF encoded and uses ASCII. Perhaps it depends on non-ASCII > > characters within the subject.

> > It's used when there are characters other than ascii in a mail header. > > Thomas

Hi! This problem should be fixed in Encode 2.83.

Thu Apr 14 08:37:48 2016 DANKOGAI [...] cpan.org - Correspondence added

On Fri Apr 01 15:03:12 2016, PALI wrote: Show quoted text

> On Pia mar 18 10:38:59 2011, trs@bestpractical.com wrote:

> > On 18 Mar 2011 10:14, Lars Reimann wrote:

> > > Hi all, > > > > > > the following problem is very annoying: > > > > > > RT Encodes Subject lines using the following concept: > > > > > > Original example Header > > > > > > Subject: > > > =?UTF- > > > 8?B?W3NlcnZpY2UubWV0YXdheXMubmV0ICM2NzAyOF0gU3BlaWNoZXJwbGF0eiBF?= > > > =?UTF-8?B?cmjDtmh1bmcgd2FzbWFpbjogNTAwIEdC?= > > > > > > The header is split into 2 parts: > > > > > > 1st part decoded: "[Queue Name #Ticket nubmer] First part of > > > subject line" > > > 2nd part decoded: "Second part of subject line" > > > > > > Completely decoded string: "[Queue Name #Ticket nubmer] First part > > > of > > > subject line"_"Second part of subject line" > > > > > > The underscore (_) marks an additional space character which is > > > introduced into ALL emails on decoding the two UTF parts.

> > > > I think this is actually a bug in Encode::MIME::Header's > > parsing/generation of the encoded header lines. I tracked it down > > when > > it broke a test in other code. I believe it was introduced with the > > fix > > for https://rt.cpan.org/Public/Bug/Display.html?id=40027. > > > > I've copied this mail to the bug tracker for Encode. > >

> > > I double checked with decoding UTF in python. Results: When using 2 > > > UTF > > > parts, a decode introduces an additional space. When using only ONE > > > UTF-string (the above subject w/o padding and UTF header) the > > > decode is > > > done correctly! > > > > > > If would be very glad the resolve this problem. If RT could use > > > only one > > > UTF string, the problem would go away. > > > How can we do that?

> > > > If you're really, really annoyed by it, I believe you can downgrade > > to > > an older Encode. But you'll regain other bugs that have been fixed > > as > > well, and I can't suggest it. > >

> > > And: does anyone have the same problem with email clients (we use > > > evolution and thunderbird, but most likely other clients are also > > > affected). > > > > > > p.s. It's unclear to me when UTF encoding is used. Sometimes the > > > Subject > > > line is not UTF encoded and uses ASCII. Perhaps it depends on non- > > > ASCII > > > characters within the subject.

> > > > It's used when there are characters other than ascii in a mail > > header. > > > > Thomas

> > Hi! This problem should be fixed in Encode 2.83.

Sat Jun 25 05:59:00 2016 pali [...] cpan.org - Correspondence added

On Pia Apr 01 15:03:12 2016, PALI wrote: Show quoted text

> Hi! This problem should be fixed in Encode 2.83.

So, please close this bug.

Sat Jun 25 06:00:01 2016 DANKOGAI [...] cpan.org - Status changed from 'open' to 'resolved'