Bug #123341 for MIME-tools: Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines

Fri Oct 20 16:25:43 2017 tg [...] mirbsd.de - Ticket created

CC:	879205 [...] bugs.debian.org, bug-MIME-tools [...] rt.cpan.org
Subject:	Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines
Date:	Fri, 20 Oct 2017 20:19:42 +0000 (UTC)
To:	gregor herrmann <gregoa [...] debian.org>, Dianne Skoll <dfs [...] roaringpenguin.com>
From:	Thorsten Glaser <tg [...] mirbsd.de>

gregor herrmann dixit: Show quoted text

>Right, forwarded upstream as >https://rt.cpan.org/Ticket/Display.html?id=123335

Dianne Skoll dixit: Show quoted text

>Below is my test program:

[…] Hi, I believe your test program is not correct. perl -MEncode -MMIME::Words -e 'print MIME::Words::encode_mimewords(Encode::encode("UTF-8", "Re: Bildungsurlaub für CCC-Fahrt? [THD#1424195]"), Charset => "UTF-8", Field => "Subject") . "\n";' This will do it. Alternatively (semi-tested) with yours: my $sample = "Re: Bildungsurlaub f\x{FC}r CCC-Fahrt? [THD#1424195]"; You were missing the “f” and “r” there. This is extremely sensitive to context. This looks to me as if the old code (with the bug from Debian #879204) was called, *then* things are re-read and re-encoded. bye, //mirabilos -- FWIW, I'm quite impressed with mksh interactively. I thought it was much *much* more bare bones. But it turns out it beats the living hell out of ksh93 in that respect. I'd even consider it for my daily use if I hadn't wasted half my life on my zsh setup. :-) -- Frank Terbeck in #!/bin/mksh

Fri Oct 20 16:31:06 2017 dfs [...] roaringpenguin.com - Correspondence added

Subject:	Re: [rt.cpan.org #123341] Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines
Date:	Fri, 20 Oct 2017 16:30:53 -0400
To:	bug-MIME-tools [...] rt.cpan.org
From:	Dianne Skoll <dfs [...] roaringpenguin.com>

On Fri, 20 Oct 2017 16:25:44 -0400 "Thorsten Glaser via RT" <bug-MIME-tools@rt.cpan.org> wrote: Show quoted text

> You were missing the “f” and “r” there. This is extremely sensitive > to context.

I realized that after I sent the reply. However, even with: my $sample = "Re: Bildungsurlaub f\x{FC}r CCC-Fahrt? [THD#1424195]"; The output is: Out: Re: Bildungsurlaub =?UTF-8?Q?f=C3=BCr=20?=CCC-Fahrt? [THD#1424195] which is correct. Show quoted text

> This looks to me as if the old code (with the bug from Debian > #879204) was called, *then* things are re-read and re-encoded.

No, probably the original input was in UTF-8 already and was re-encoded by the call to Encode::encode. That's why I always run tests using \x{..} Unicode escapes rather than typing Unicode characters > \x{07f} directly into source code. Regards, Dianne.

Fri Oct 20 16:31:06 2017 The RT System itself - Status changed from 'new' to 'open'

Fri Oct 20 16:37:52 2017 dfs [...] roaringpenguin.com - Correspondence added

Subject:	Re: [rt.cpan.org #123341] Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines
Date:	Fri, 20 Oct 2017 16:37:35 -0400
To:	bug-MIME-tools [...] rt.cpan.org, 879204 [...] bugs.debian.org
From:	Dianne Skoll <dfs [...] roaringpenguin.com>

Hi, This is not a bug in MIME::tools. The OP misunderstands how Perl works. He typed UTF-8 source code in and is double encoding it. Here's a test program: #=================================================================== use MIME::Words; use Encode; my $sample = "Re: Bildungsurlaub für CCC-Fahrt? [THD#1424195]"; my $utf8 = Encode::encode('UTF-8', $sample); my $out = MIME::Words::encode_mimewords($utf8, Charset => 'UTF-8'); print "Out: $out\n"; #=================================================================== If I run: perl test-utf8.pl Output is: Out: Re: Bildungsurlaub =?UTF-8?Q?f=C3=83=C2=BCr=20?=CCC-Fahrt? [THD#1424195] But that's because the word "für" is *already* UTF-8. If I tell Perl to convert UTF-8 in the source code to native Perl Unicode, the result is very different: perl -Mutf8 test-utf8.pl Output is: Out: Re: Bildungsurlaub =?UTF-8?Q?f=C3=BCr=20?=CCC-Fahrt? [THD#1424195] The OP should read "perldoc utf8" and should also not use UTF-8 directly as Perl source code; use \x{FC} rather than ü, etc. Regards, Dianne.

Fri Oct 20 16:40:46 2017 gregoa [...] cpan.org - Cc GREGOA added

Fri Oct 20 18:01:44 2017 tg [...] mirbsd.de - Correspondence added

CC:	879205 [...] bugs.debian.org
Subject:	Re: [rt.cpan.org #123341] Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines
Date:	Fri, 20 Oct 2017 21:54:19 +0000 (UTC)
To:	Dianne Skoll via RT <bug-MIME-tools [...] rt.cpan.org>
From:	Thorsten Glaser <tg [...] mirbsd.de>

Dianne Skoll via RT dixit: Show quoted text

>No, probably the original input was in UTF-8 already and was re-encoded >by the call to Encode::encode. That's why I always run tests using

Hm, probably. I’d say you just found a bug in OTRS then ;-) (Just now it’s going to be another tricky thing to figure out where exactly and how to fix that. Might report this to the OTRS developers.) One thing I don’t understand is how this was *not* double- encoded in the old version of MIME tools? Thanks, //mirabilos -- 18:47⎜<mirabilos:#!/bin/mksh> well channels… you see, I see everything in the same window anyway 18:48⎜<xpt:#!/bin/mksh> i know, you have some kind of telnet with automatic pong 18:48⎜<mirabilos:#!/bin/mksh> haha, yes :D 18:49⎜<mirabilos:#!/bin/mksh> though that's more tinyirc – sirc is more comfy

Fri Oct 20 19:18:31 2017 dfs [...] roaringpenguin.com - Correspondence added

Subject:	Re: [rt.cpan.org #123341] Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines
Date:	Fri, 20 Oct 2017 19:18:15 -0400
To:	bug-MIME-tools [...] rt.cpan.org
From:	Dianne Skoll <dfs [...] roaringpenguin.com>

On Fri, 20 Oct 2017 18:01:45 -0400 "Thorsten Glaser via RT" <bug-MIME-tools@rt.cpan.org> wrote: Show quoted text

> One thing I don't understand is how this was *not* double- > encoded in the old version of MIME tools?

I don't understand that either. Maybe it was also an older version of Perl? Perl's UTF-8 handling underwent extensive changes a few years ago. Regards, Dianne.

Fri Oct 20 20:01:45 2017 tg [...] mirbsd.de - Correspondence added

CC:	879205 [...] bugs.debian.org
Subject:	Re: [rt.cpan.org #123341] Re: Bug#879205: MIME::Words::encode_mimewords: double-encodes (produces Mojibake), produces too long lines
Date:	Fri, 20 Oct 2017 23:59:29 +0000 (UTC)
To:	Dianne Skoll via RT <bug-MIME-tools [...] rt.cpan.org>
From:	Thorsten Glaser <tg [...] mirbsd.de>

Dianne Skoll via RT dixit: Show quoted text

>I don't understand that either. Maybe it was also an older version of >Perl? Perl's UTF-8 handling underwent extensive changes a few years ago.

Yes, that was on Debian wheezy. I reported this as bug in Debian against wheezy (which is still supported-ish) first, then as a separate bug against sid because I tried to see if it was still reproducible, and got a different result. Let me dig out version numbers… Original system: otrs2 3.3.18-1~deb7u1 perl 5.14.2-21+deb7u5 libmime-tools-perl 5.503-1 New system: perl 5.26.0-8 libmime-tools-perl 5.508-1 In addition to that, OTRS would have the original subject in a Perl string already, whereas I tried¹ to draft a testcase until I succeeded reproducing the original bug. ① tried, because I don’t really know Perl — I just can program bye, //mirabilos -- (gnutls can also be used, but if you are compiling lynx for your own use, there is no reason to consider using that package) -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL

Mon Oct 23 09:27:45 2017 dfs+pause [...] roaringpenguin.com - Correspondence added

Not a bug in MIME::tools. Closing.

Mon Oct 23 09:27:46 2017 dfs+pause [...] roaringpenguin.com - Status changed from 'open' to 'rejected'

Mon Oct 23 09:27:46 2017 dfs+pause [...] roaringpenguin.com - Taken