Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 61676
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: sprout [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: decode_utf8 idempotence
Date: Sun, 26 Sep 2010 13:22:39 -0700
To: bug-Encode [...] rt.cpan.org
From: Father Chrysostomos <sprout [...] cpan.org>
The fix for #14559 ‘fix for #8872 introduces new “bug”’ (<https://rt.cpan.org/Public/Bug/Display.html?id=14559>) itself introduces a bug. If I have a string containing "\xc3\xa9" that just happens to have the UTF8 flag on (e.g., substr "\x{100}\xc3\xa9", 1), decode_utf8 won’t decode it. The UTF8 flag is something internal to perl, which should not be used in deciding what to do with a given string. I think that bug #14559 is not a bug at all: On Mon Sep 12 16:48:04 2005, RUZ wrote: Show quoted text
> Fix for http://rt.cpan.org/NoAuth/Bug.html?id=8872 doesn't allow to > use strings with UTF-8 flag as decode_utf8 argument: > > $ perl -MEncode -we 'decode_utf8("\x{100}")' > Cannot decode string with wide characters at > /usr/lib/perl5/5.8.7/x86_64-linux/Encode.pm line 166.
You can’t decode something other than bytes. There every decode routine must only accept characters in the range 0..255. How they are encoded internally by perl should be irrelevant. Show quoted text
> This behaviour is not documented and also is not consistent with > encode_utf8 that doesn't die when string has no UTF-8 flag.
Again, whether the UTF8 flag is on or not should be irrelevant. encode_utf8 doesn’t die because perl string cannot contain anything it cannot handle.
See http://rt.cpan.org/Public/Bug/Display.html?id=61456 Dan the Maintainer Thereof On Sun Sep 26 16:22:49 2010, sprout@cpan.org wrote: Show quoted text
> The fix for #14559 ‘fix for #8872 introduces new “bug”’ > (<https://rt.cpan.org/Public/Bug/Display.html?id=14559>) itself > introduces a bug. > > If I have a string containing "\xc3\xa9" that just happens to have the > UTF8 flag on (e.g., substr "\x{100}\xc3\xa9", 1), decode_utf8 won’t > decode it. > > The UTF8 flag is something internal to perl, which should not be used > in deciding what to do with a given string. > > I think that bug #14559 is not a bug at all: > > On Mon Sep 12 16:48:04 2005, RUZ wrote:
> > Fix for http://rt.cpan.org/NoAuth/Bug.html?id=8872 doesn't allow to > > use strings with UTF-8 flag as decode_utf8 argument: > > > > $ perl -MEncode -we 'decode_utf8("\x{100}")' > > Cannot decode string with wide characters at > > /usr/lib/perl5/5.8.7/x86_64-linux/Encode.pm line 166.
> > You can’t decode something other than bytes. There every decode > routine must only accept characters in the range 0..255. How they are > encoded internally by perl should be irrelevant. >
> > This behaviour is not documented and also is not consistent with > > encode_utf8 that doesn't die when string has no UTF-8 flag.
> > Again, whether the UTF8 flag is on or not should be irrelevant. > encode_utf8 doesn’t die because perl string cannot contain anything it > cannot handle. >