Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 24836
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: nick [...] aevum.de
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: MIME-Q splits multibyte chars across encoded words
The following code use Encode; print encode('MIME-Q', encode_utf8('12345678901234567890ä')), "\n"; produces =?UTF-8?Q?12345678901234567890=C3?==?UTF-8?Q?=A4?= Note that the multibyte character 'ä' is split across the two encoded words.
This is not a bug. You are merely doubly encoding. Dan the Encode Maintainer On Wed Feb 07 15:36:37 2007, nick.aevum.de wrote: Show quoted text
> The following code > > use Encode; > print encode('MIME-Q', encode_utf8('12345678901234567890ä')), "\n"; > > produces > > =?UTF-8?Q?12345678901234567890=C3?==?UTF-8?Q?=A4?= > > Note that the multibyte character 'ä' is split across the two encoded words. > >
From: nick [...] aevum.de
I'm not doubly encoding. The encode_utf8 is only because of bug #24418. And this has nothing to do with the fact that multibyte characters are split across encoded words. To quote RFC 2047: "Each 'encoded-word' MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded-word's." Nick On Fri Apr 06 07:19:30 2007, DANKOGAI wrote: Show quoted text
> This is not a bug. You are merely doubly encoding. > > Dan the Encode Maintainer > > On Wed Feb 07 15:36:37 2007, nick.aevum.de wrote:
> > The following code > > > > use Encode; > > print encode('MIME-Q', encode_utf8('12345678901234567890ä')), "\n"; > > > > produces > > > > =?UTF-8?Q?12345678901234567890=C3?==?UTF-8?Q?=A4?= > > > > Note that the multibyte character 'ä' is split across the two
> encoded words.
> > > >
> >
From: DANKOGAI [...] cpan.org
On Wed Feb 07 15:36:37 2007, nick.aevum.de wrote: Show quoted text
> The following code > > use Encode; > print encode('MIME-Q', encode_utf8('12345678901234567890ä')), "\n";
WRONG^^^^^^^^^^^ The right way to do is: # source is in latin1 use Encode; print encode('MIME-Q', decode_utf8('12345678901234567890ä')), "\n"; or # source is in utf-8 use Encode; use utf8; # this makes sure literals are treated as UTF-8 string print encode('MIME-Q', '12345678901234567890ä'), "\n"; And you will get =?UTF-8?Q?12345678901234567890?==?UTF-8?Q?=C3=A4?= remember, encode() takes UTF-8 string as source string. Dan the Encode Maintainer
From: nick [...] aevum.de
I just upgraded to Encode 2.19 and everything works as expected. Thanks, Nick On Sun Apr 08 14:40:18 2007, DANKOGAI wrote: Show quoted text
> On Wed Feb 07 15:36:37 2007, nick.aevum.de wrote:
> > The following code > > > > use Encode; > > print encode('MIME-Q', encode_utf8('12345678901234567890ä')), "\n";
> WRONG^^^^^^^^^^^ > > The right way to do is: > > # source is in latin1 > use Encode; > print encode('MIME-Q', decode_utf8('12345678901234567890ä')), "\n"; > > or > > # source is in utf-8 > use Encode; > use utf8; # this makes sure literals are treated as UTF-8 string > print encode('MIME-Q', '12345678901234567890ä'), "\n"; > > And you will get > > =?UTF-8?Q?12345678901234567890?==?UTF-8?Q?=C3=A4?= > > remember, encode() takes UTF-8 string as source string. > > Dan the Encode Maintainer
On Mon Apr 09 08:41:13 2007, nick.aevum.de wrote: Show quoted text
> > I just upgraded to Encode 2.19 and everything works as expected. > > Thanks, > Nick
Closing RT Dan the Encode Maintainer