Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 88717
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: wiml [...] hhhh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in:
  • 2.39
  • 2.55
  • 2.44
  • 2.60
  • 2.73
  • 2.78
Fixed in: (no value)



Subject: encode('MIME-Header') does not find word boundaries correctly
Encoding a string which contains a colon (or, presumably, other header specials) can produce invalid output, because the resulting encoded-words are not bounded by whitespace. For example, this: print encode('MIME-Header', "Hey foo\x{2024}bar:whee")."\n"; produces this: =?UTF-8?B?SGV5IGZvb+KApGJhcg==?=:whee which is invalid because there is no space between the encoded-word and the colon. RFC2047 makes this fairly clear in section 5, where it describes the three places you can use an encoded-word; in each of the three, it says "an 'encoded-word' that appears in [that place] MUST be separated from any adjacent [stuff] by 'linear-white-space'.". For encoding a Subject: header or other "*text" field, I think there are only two valid places to have an encoded-word boundary: either between two successive encoded-words (in which case the separating whitespace is stripped by the decoder) or at a place where an encoded word is separated from a non-encoded word by whitespace (in which case the whitespace is not stripped by the decoder).
I just confirmed that this bug is still present in bleadperl and Encode 2.78. Is there anything that blocks this bug from fixing? I see people are writing new modules to fix or work around this (MIME::Words, MIME::EncEncWords, Email::MIME::RFC2047), which looks unfortunate. Dan, which way forward would be your favorite? Which options do you see? Thanks,
One of the problems of MIME-Header is that it is not exactly "encoding". It is in fact a meta-encoding that needs to be meta-transcoded. It is kind of like "UTF-8-MAC" that Encode does not currently support alone. But we have Unicode::Normalize and in conjunction thereof Perl as a whole supports it. The sole reason Encode currently supports -- imperfectly -- is that one of its predecessors, Jcode, supported it (It only supported MIME encoding + ISO-2022-JP). From consistency's point of view, Encode should rather drop its support and hand it over to other module(s). At the same time, Encode supports many unused and obsolete encodings because and just because there happened to be documents at unicode.org during its development phase. I ought to clean them up but it is much harder to drop than add, both technically and politically. Dan the Maintainer Thereof On Wed Dec 02 22:21:42 2015, ANDK wrote: Show quoted text
> I just confirmed that this bug is still present in bleadperl and > Encode 2.78. > > Is there anything that blocks this bug from fixing? I see people are > writing new modules to fix or work around this (MIME::Words, > MIME::EncEncWords, Email::MIME::RFC2047), which looks unfortunate. > > Dan, which way forward would be your favorite? Which options do you > see? > > Thanks,
OK I've got it. Finally resolved: https://github.com/dankogai/p5-encode/commit/b14a812a86f7e586ac22e9410ffc50aa5dc3969a Dan the Maintainer Thereof On Wed Dec 02 23:26:02 2015, DANKOGAI wrote: Show quoted text
> One of the problems of MIME-Header is that it is not exactly > "encoding". It is in fact a meta-encoding that needs to be meta- > transcoded. It is kind of like "UTF-8-MAC" that Encode does not > currently support alone. But we have Unicode::Normalize and in > conjunction thereof Perl as a whole supports it. > > The sole reason Encode currently supports -- imperfectly -- is that > one of its predecessors, Jcode, supported it (It only supported MIME > encoding + ISO-2022-JP). From consistency's point of view, Encode > should rather drop its support and hand it over to other module(s). > > At the same time, Encode supports many unused and obsolete encodings > because and just because there happened to be documents at unicode.org > during its development phase. I ought to clean them up but it is much > harder to drop than add, both technically and politically. > > Dan the Maintainer Thereof > > On Wed Dec 02 22:21:42 2015, ANDK wrote:
> > I just confirmed that this bug is still present in bleadperl and > > Encode 2.78. > > > > Is there anything that blocks this bug from fixing? I see people are > > writing new modules to fix or work around this (MIME::Words, > > MIME::EncEncWords, Email::MIME::RFC2047), which looks unfortunate. > > > > Dan, which way forward would be your favorite? Which options do you > > see? > > > > Thanks,