Bug #61676 for Encode: decode

Sun Sep 26 16:22:49 2010 sprout [...] cpan.org - Ticket created

Subject:	decode_utf8 idempotence
Date:	Sun, 26 Sep 2010 13:22:39 -0700
To:	bug-Encode [...] rt.cpan.org
From:	Father Chrysostomos <sprout [...] cpan.org>

The fix for #14559 ‘fix for #8872 introduces new “bug”’ (<https://rt.cpan.org/Public/Bug/Display.html?id=14559>) itself introduces a bug. If I have a string containing "\xc3\xa9" that just happens to have the UTF8 flag on (e.g., substr "\x{100}\xc3\xa9", 1), decode_utf8 won’t decode it. The UTF8 flag is something internal to perl, which should not be used in deciding what to do with a given string. I think that bug #14559 is not a bug at all: On Mon Sep 12 16:48:04 2005, RUZ wrote: Show quoted text

> Fix for http://rt.cpan.org/NoAuth/Bug.html?id=8872 doesn't allow to > use strings with UTF-8 flag as decode_utf8 argument: > > $ perl -MEncode -we 'decode_utf8("\x{100}")' > Cannot decode string with wide characters at > /usr/lib/perl5/5.8.7/x86_64-linux/Encode.pm line 166.

You can’t decode something other than bytes. There every decode routine must only accept characters in the range 0..255. How they are encoded internally by perl should be irrelevant. Show quoted text

> This behaviour is not documented and also is not consistent with > encode_utf8 that doesn't die when string has no UTF-8 flag.

Again, whether the UTF8 flag is on or not should be irrelevant. encode_utf8 doesn’t die because perl string cannot contain anything it cannot handle.

Sun Sep 26 19:55:52 2010 DANKOGAI [...] cpan.org - Correspondence added

See http://rt.cpan.org/Public/Bug/Display.html?id=61456 Dan the Maintainer Thereof On Sun Sep 26 16:22:49 2010, sprout@cpan.org wrote: Show quoted text

> The fix for #14559 ‘fix for #8872 introduces new “bug”’ > (<https://rt.cpan.org/Public/Bug/Display.html?id=14559>) itself > introduces a bug. > > If I have a string containing "\xc3\xa9" that just happens to have the > UTF8 flag on (e.g., substr "\x{100}\xc3\xa9", 1), decode_utf8 won’t > decode it. > > The UTF8 flag is something internal to perl, which should not be used > in deciding what to do with a given string. > > I think that bug #14559 is not a bug at all: > > On Mon Sep 12 16:48:04 2005, RUZ wrote:

> > Fix for http://rt.cpan.org/NoAuth/Bug.html?id=8872 doesn't allow to > > use strings with UTF-8 flag as decode_utf8 argument: > > > > $ perl -MEncode -we 'decode_utf8("\x{100}")' > > Cannot decode string with wide characters at > > /usr/lib/perl5/5.8.7/x86_64-linux/Encode.pm line 166.

> > You can’t decode something other than bytes. There every decode > routine must only accept characters in the range 0..255. How they are > encoded internally by perl should be irrelevant. >

> > This behaviour is not documented and also is not consistent with > > encode_utf8 that doesn't die when string has no UTF-8 flag.

> > Again, whether the UTF8 flag is on or not should be irrelevant. > encode_utf8 doesn’t die because perl string cannot contain anything it > cannot handle. >

Sun Sep 26 19:55:54 2010 The RT System itself - Status changed from 'new' to 'open'

Sun Sep 26 19:55:54 2010 DANKOGAI [...] cpan.org - Status changed from 'open' to 'resolved'

Bug #61676 for Encode: decode_utf8 idempotence