Bug #88717 for Encode: encode('MIME-Header') does not find word boundaries correctly

Mon Sep 16 20:07:07 2013 wiml [...] hhhh.org - Ticket created

Subject:

encode('MIME-Header') does not find word boundaries correctly

Encoding a string which contains a colon (or, presumably, other header specials) can produce invalid output, because the resulting encoded-words are not bounded by whitespace. For example, this: print encode('MIME-Header', "Hey foo\x{2024}bar:whee")."\n"; produces this: =?UTF-8?B?SGV5IGZvb+KApGJhcg==?=:whee which is invalid because there is no space between the encoded-word and the colon. RFC2047 makes this fairly clear in section 5, where it describes the three places you can use an encoded-word; in each of the three, it says "an 'encoded-word' that appears in [that place] MUST be separated from any adjacent [stuff] by 'linear-white-space'.". For encoding a Subject: header or other "*text" field, I think there are only two valid places to have an encoded-word boundary: either between two successive encoded-words (in which case the separating whitespace is stripped by the decoder) or at a place where an encoded word is separated from a non-encoded word by whitespace (in which case the whitespace is not stripped by the decoder).

Mon Sep 16 20:28:44 2013 wiml [...] hhhh.org - Broken in 2.39 added

Mon Sep 16 20:28:44 2013 wiml [...] hhhh.org - Broken in 2.55 added

Sun Apr 06 14:28:05 2014 DANKOGAI [...] cpan.org - Status changed from 'new' to 'open'

Mon Dec 08 21:11:03 2014 wiml [...] hhhh.org - Broken in 2.44 added

Mon Dec 08 21:11:03 2014 wiml [...] hhhh.org - Broken in 2.60 added

Thu Jun 11 16:24:07 2015 wiml [...] hhhh.org - Broken in 2.73 added

Wed Dec 02 22:11:05 2015 ANDK [...] cpan.org - Broken in 2.78 added

Wed Dec 02 22:21:42 2015 ANDK [...] cpan.org - Correspondence added

I just confirmed that this bug is still present in bleadperl and Encode 2.78. Is there anything that blocks this bug from fixing? I see people are writing new modules to fix or work around this (MIME::Words, MIME::EncEncWords, Email::MIME::RFC2047), which looks unfortunate. Dan, which way forward would be your favorite? Which options do you see? Thanks,

Wed Dec 02 23:26:02 2015 DANKOGAI [...] cpan.org - Correspondence added

One of the problems of MIME-Header is that it is not exactly "encoding". It is in fact a meta-encoding that needs to be meta-transcoded. It is kind of like "UTF-8-MAC" that Encode does not currently support alone. But we have Unicode::Normalize and in conjunction thereof Perl as a whole supports it. The sole reason Encode currently supports -- imperfectly -- is that one of its predecessors, Jcode, supported it (It only supported MIME encoding + ISO-2022-JP). From consistency's point of view, Encode should rather drop its support and hand it over to other module(s). At the same time, Encode supports many unused and obsolete encodings because and just because there happened to be documents at unicode.org during its development phase. I ought to clean them up but it is much harder to drop than add, both technically and politically. Dan the Maintainer Thereof On Wed Dec 02 22:21:42 2015, ANDK wrote: Show quoted text

> I just confirmed that this bug is still present in bleadperl and > Encode 2.78. > > Is there anything that blocks this bug from fixing? I see people are > writing new modules to fix or work around this (MIME::Words, > MIME::EncEncWords, Email::MIME::RFC2047), which looks unfortunate. > > Dan, which way forward would be your favorite? Which options do you > see? > > Thanks,

Fri Jan 22 01:23:06 2016 DANKOGAI [...] cpan.org - Correspondence added

OK I've got it. Finally resolved: https://github.com/dankogai/p5-encode/commit/b14a812a86f7e586ac22e9410ffc50aa5dc3969a Dan the Maintainer Thereof On Wed Dec 02 23:26:02 2015, DANKOGAI wrote: Show quoted text

> One of the problems of MIME-Header is that it is not exactly > "encoding". It is in fact a meta-encoding that needs to be meta- > transcoded. It is kind of like "UTF-8-MAC" that Encode does not > currently support alone. But we have Unicode::Normalize and in > conjunction thereof Perl as a whole supports it. > > The sole reason Encode currently supports -- imperfectly -- is that > one of its predecessors, Jcode, supported it (It only supported MIME > encoding + ISO-2022-JP). From consistency's point of view, Encode > should rather drop its support and hand it over to other module(s). > > At the same time, Encode supports many unused and obsolete encodings > because and just because there happened to be documents at unicode.org > during its development phase. I ought to clean them up but it is much > harder to drop than add, both technically and politically. > > Dan the Maintainer Thereof > > On Wed Dec 02 22:21:42 2015, ANDK wrote:

> > I just confirmed that this bug is still present in bleadperl and > > Encode 2.78. > > > > Is there anything that blocks this bug from fixing? I see people are > > writing new modules to fix or work around this (MIME::Words, > > MIME::EncEncWords, Email::MIME::RFC2047), which looks unfortunate. > > > > Dan, which way forward would be your favorite? Which options do you > > see? > > > > Thanks,

Fri Jan 22 01:23:07 2016 DANKOGAI [...] cpan.org - Status changed from 'open' to 'resolved'