Date: | Fri, 23 Dec 2005 02:56:58 -0800 |
From: | Yitzchak Scott-Thoennes <sthoenna [...] efn.org> |
To: | bug-Encode [...] rt.cpan.org |
Subject: | Re: [perl #37757] decode_utf8 broken in perl 5.8.7 |
Filing this in CPAN's RT in case that gets Dan's attention.
On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes wrote:
Show quoted text
> On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> >
> > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
>
> Ah, I hadn't noticed that; that doesn't agree with the doc in Encode
> itself, but up through Encode 2.09 (2.08 was included with perl5.8.6),
> decode_utf8 did actually just call utf8::decode when no check
> parameter was passed. Encode 2.10 (in perl5.8.7) now works as
> described in the Encode doc, but doesn't work as described in
> perluniintro.
>
> Dan, perhaps it would be a good idea to put back the old behavior
> (reversing the change you made for
> http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc
> instead) when no check parameter is passed?
> > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de wrote:
> >
> > Well, the perluniintro manpage says:
> >
> > - How Do I Detect Data That's Not Valid In a Particular Encoding?
> >
> > Use the "Encode" package to try converting it. For example,
> >
> > use Encode 'decode_utf8';
> > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) {
> > # valid
> > } else {
> > # invalid
> > }
> > > > decode_utf8() doesn't return "false" if run with non-UTF-8 string. It just
> > > > returns the non-UTF-8 string. To see this bug in action use convmv from
> > > > http://j3e.de/linux/convmv/ and convert a filename from latin1 to utf8. It will
> > > > tell you that the file is already UTF-8 encoded. convmv evaluates decode_utf8()
> > > > to see if a file is already utf-8-encoded.
> > >
> > > I don't see any indication in the Encode doc that decode_utf8 would
> > > ever return false on error. To use it to check for valid utf8, I
> > > think you'd need to specify the CHECK parameter as FB_CROAK and wrap
> > > the call in an eval {}; see:
> > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data
> > >
> > > Perhaps you should use utf8::decode() instead?