Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 27795
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: henrik [...] adapt.dk
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Encode::MIME::Header::encode should not split on 'especials'
Encode::MIME::Header::encode splits the string to encode on the regex '$re_especials'. This causes strings like eg 'Adapt og søn A/S' (a hypothetical but possible name for a danish firm) to be encoded as 2 lumps: '=?UTF-8?B?QWRhcHQgb2cgc8O4biBB?=/S' If this string is used as the subject field of an e-mail most mail-readers interpret it incorectly (often not decoding the encoded string). RFC2047 states in 5.1 that: Ordinary ASCII text and 'encoded-word's may appear together in the same header field. However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'. This is obviously not the case here. I cannot see from RFC2047 any reason to split on 'especials'. The restriction (as far as I can read it) is that 'encoded-words' may not contain any of the 'especial'-characters. My suggestion would be to simplify the encoding to just split due to length - or have I misunderstood something?
From: peter [...] rathlev.dk
This also causes problem for Bugzilla (at least 3.0.5) when sending email notifications with non 7-bit headers, e.g. subjects. [prathlev@abehat ~]$ perl -e ' Show quoted text
> use Encode; > use Encode::MIME::Header; > print encode("MIME-Q", > "[Bug 145] Test æ ø å og mere + ændring"), > "\n"; > '
[Bug 145]=?UTF-8?Q?=20Test=20=C3=A6=20=C3=B8=20=C3=A5=20og=20mer?= =?UTF-8?Q?e=20=2B=20=C3=A6ndring?= [prathlev@abehat ~]$ As Henrik states this is not RFC 2047 compliant, atoms must be seperated by linear whitespace. The best solution is probably another way of splitting, but as a quick work around you can move an existing space (" ") from the beginning of the encoded atom an out into the unencoded text: --- /usr/lib/perl5/5.8.8/Encode/MIME/Header.pm 2008-08-28 15:58:19.000000000 +0200 +++ /home/prathlev/Header.pm 2008-10-30 23:48:18.000000000 +0100 @@ -111,6 +111,7 @@ my (@word, @subline); for my $word (split /($re_especials)/o, $line){ if ($word =~ /[^\x00-\x7f]/o or $word =~ /^$re_encoded_word$/o){ + push @word, ' ' if $word =~ s/^ //; push @word, $obj->_encode($word); }else{ push @word, $word; This way the encoded block has a slightly higher chance of being RFC 2047 compliant. Best regards, Peter Rathlev
On Thu Oct 30 18:53:50 2008, prathlev wrote: Show quoted text
> This also causes problem for Bugzilla (at least 3.0.5) when sending > email notifications with non 7-bit headers, e.g. subjects. > > [prathlev@abehat ~]$ perl -e '
> > use Encode; > > use Encode::MIME::Header; > > print encode("MIME-Q", > > "[Bug 145] Test æ ø å og mere + ændring"), > > "\n"; > > '
> [Bug 145]=?UTF-8?Q?=20Test=20=C3=A6=20=C3=B8=20=C3=A5=20og=20mer?= > =?UTF-8?Q?e=20=2B=20=C3=A6ndring?= > [prathlev@abehat ~]$ > > As Henrik states this is not RFC 2047 compliant, atoms must be seperated > by linear whitespace. > > The best solution is probably another way of splitting, but as a quick > work around you can move an existing space (" ") from the beginning of > the encoded atom an out into the unencoded text: > > --- /usr/lib/perl5/5.8.8/Encode/MIME/Header.pm 2008-08-28 > 15:58:19.000000000 +0200 > +++ /home/prathlev/Header.pm 2008-10-30 23:48:18.000000000 +0100 > @@ -111,6 +111,7 @@ > my (@word, @subline); > for my $word (split /($re_especials)/o, $line){ > if ($word =~ /[^\x00-\x7f]/o or $word =~ /^$re_encoded_word$/o){ > + push @word, ' ' if $word =~ s/^ //; > push @word, $obj->_encode($word); > }else{ > push @word, $word; > > This way the encoded block has a slightly higher chance of being RFC > 2047 compliant. > > Best regards, > Peter Rathlev
Here is the result in the recent Encode. ==== % perl foo.pl [Bug 145]=?UTF-8?Q?=20Test=20=C3=A6=20=C3=B8=20=C3=A5=20og=20mer?= =?UTF-8?Q?e=20=2B=20=C3=A6ndring?= ==== As you see the whitespace is inserted. I consider this fixed. Dan the Encode Maintainer