Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 16698
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: sthoenna [...] efn.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Date: Fri, 23 Dec 2005 02:56:58 -0800
From: Yitzchak Scott-Thoennes <sthoenna [...] efn.org>
To: bug-Encode [...] rt.cpan.org
Subject: Re: [perl #37757] decode_utf8 broken in perl 5.8.7
Filing this in CPAN's RT in case that gets Dan's attention. On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes wrote: Show quoted text
> On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> > > > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
> > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de wrote:
> > > > decode_utf8() doesn't return "false" if run with non-UTF-8 string. It just > > > > returns the non-UTF-8 string. To see this bug in action use convmv from > > > > http://j3e.de/linux/convmv/ and convert a filename from latin1 to utf8. It will > > > > tell you that the file is already UTF-8 encoded. convmv evaluates decode_utf8() > > > > to see if a file is already utf-8-encoded.
> > > > > > I don't see any indication in the Encode doc that decode_utf8 would > > > ever return false on error. To use it to check for valid utf8, I > > > think you'd need to specify the CHECK parameter as FB_CROAK and wrap > > > the call in an eval {}; see: > > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data > > > > > > Perhaps you should use utf8::decode() instead?
> > > > Well, the perluniintro manpage says: > > > > - How Do I Detect Data That's Not Valid In a Particular Encoding? > > > > Use the "Encode" package to try converting it. For example, > > > > use Encode 'decode_utf8'; > > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { > > # valid > > } else { > > # invalid > > }
> > Ah, I hadn't noticed that; that doesn't agree with the doc in Encode > itself, but up through Encode 2.09 (2.08 was included with perl5.8.6), > decode_utf8 did actually just call utf8::decode when no check > parameter was passed. Encode 2.10 (in perl5.8.7) now works as > described in the Encode doc, but doesn't work as described in > perluniintro. > > Dan, perhaps it would be a good idea to put back the old behavior > (reversing the change you made for > http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc > instead) when no check parameter is passed?
On Fri Dec 23 05:57:49 2005, sthoenna@efn.org wrote: Show quoted text
> Filing this in CPAN's RT in case that gets Dan's attention. > > On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes > wrote:
> > On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> > > > > > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
> > > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de
> wrote:
> > > > > decode_utf8() doesn't return "false" if run with non-UTF-8
> string. It just
> > > > > returns the non-UTF-8 string. To see this bug in action use
> convmv from
> > > > > http://j3e.de/linux/convmv/ and convert a filename from latin1
> to utf8. It will
> > > > > tell you that the file is already UTF-8 encoded. convmv
> evaluates decode_utf8()
> > > > > to see if a file is already utf-8-encoded.
> > > > > > > > I don't see any indication in the Encode doc that decode_utf8
> would
> > > > ever return false on error. To use it to check for valid utf8,
> I
> > > > think you'd need to specify the CHECK parameter as FB_CROAK and
> wrap
> > > > the call in an eval {}; see: > > > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data > > > > > > > > Perhaps you should use utf8::decode() instead?
> > > > > > Well, the perluniintro manpage says: > > > > > > - How Do I Detect Data That's Not Valid In a Particular Encoding? > > > > > > Use the "Encode" package to try converting it. For example, > > > > > > use Encode 'decode_utf8'; > > > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { > > > # valid > > > } else { > > > # invalid > > > }
> > > > Ah, I hadn't noticed that; that doesn't agree with the doc in Encode > > itself, but up through Encode 2.09 (2.08 was included with
> perl5.8.6),
> > decode_utf8 did actually just call utf8::decode when no check > > parameter was passed. Encode 2.10 (in perl5.8.7) now works as > > described in the Encode doc, but doesn't work as described in > > perluniintro. > > > > Dan, perhaps it would be a good idea to put back the old behavior > > (reversing the change you made for > > http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc > > instead) when no check parameter is passed?
RT #14559 reports the same bug which is fixed in 2.13. Dan the Encode Maintainer