Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 28822
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: jsadusk [...] gridapp.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.23
Fixed in: (no value)



Subject: Encode::Guess dies if UTF-16 and iso-8859-1 are both specified
I attempted to use Encode::Guess::guess_encoding to find the encoding on files in this set: iso-8859-1, ascii, utf8, utf16 If I try this on a file that is actually iso-8859-1, it will die with this error: Error finding encoding: UTF-16:Unrecognised BOM 436f at /app/clarity/perl/lib/5.8.7/i686-linux-thread-multi/Encode/Guess.pm line 135. This file doesn't have a BOM, and 436f is "Co", the first two characters of the file. iso-8859-1 shouldn't require a BOM, so this shouldn't be an issue. It appears that the problem is when it is trying different decoders, it doesn't catch errors thrown. UTF-16 ends up first in the list of things to try, and dies when it can't find a BOM. What appears to fix the problem is just wrapping the decode line in an eval block. This patch seems to fix the problem entirely. diff -r Encode-2.23/lib/Encode/Guess.pm Encode-2.23-js/lib/Encode/Guess.pm 139c139,142 < $try{$k}->decode( $scratch, FB_QUIET ); --- Show quoted text
> eval { > $try{$k}->decode( $scratch, FB_QUIET ); > }; >
That is not a bug. Guessing ISO-8859-1 is practically impossible. See "CAVEATS" of perldoc Encode::Guess. Dan the Maintainer Thereof On Mon Aug 13 18:35:54 2007, jsadusk wrote: Show quoted text
> I attempted to use Encode::Guess::guess_encoding to find the encoding on > files in this set: iso-8859-1, ascii, utf8, utf16 > > If I try this on a file that is actually iso-8859-1, it will die with > this error: > Error finding encoding: UTF-16:Unrecognised BOM 436f at > /app/clarity/perl/lib/5.8.7/i686-linux-thread-multi/Encode/Guess.pm line > 135. > > This file doesn't have a BOM, and 436f is "Co", the first two characters > of the file. iso-8859-1 shouldn't require a BOM, so this shouldn't be > an issue. It appears that the problem is when it is trying different > decoders, it doesn't catch errors thrown. UTF-16 ends up first in the > list of things to try, and dies when it can't find a BOM. What appears > to fix the problem is just wrapping the decode line in an eval block. > This patch seems to fix the problem entirely. > > diff -r Encode-2.23/lib/Encode/Guess.pm Encode-2.23-js/lib/Encode/Guess.pm > 139c139,142 > < $try{$k}->decode( $scratch, FB_QUIET ); > ---
> > eval { > > $try{$k}->decode( $scratch, FB_QUIET ); > > }; > >
Subject: Re: [rt.cpan.org #28822] Resolved: Encode::Guess dies if UTF-16 and iso-8859-1 are both specified
Date: Wed, 07 May 2008 15:50:54 -0400
To: bug-Encode [...] rt.cpan.org
From: Joseph Sadusk <jsadusk [...] gridapp.com>
The bug isn't the inability to guess ISO-8859-1, the but is UTF-16 dieing if there isn't a BOM, preventing any subsequent encoding guesses from trying. Joe Sadusk On Wed, 2008-05-07 at 15:48 -0400, DANKOGAI via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=28822 > > > According to our records, your request has been resolved. If you have any > further questions or concerns, please respond to this message.
Does not matter. Encode::Guess depends on the fact that some encodings like as EUC-* does have such byte sequences that never happens (for instance, UTF-8 never contains \xFF). Which is not the case for UTF-16 and ISO-8859-1. Consider my $unknown = "\xFE\xFFabc\n"; This $unknown is valid in UTF-16 AND ISO-8859-1. Encode::Guess pod explicitly documents NOT to use Encode::Guess in such encodings. Dan the Maintainer Thereof On Wed May 07 15:51:21 2008, jsadusk wrote: Show quoted text
> The bug isn't the inability to guess ISO-8859-1, the but is UTF-16 > dieing if there isn't a BOM, preventing any subsequent encoding guesses > from trying. > > Joe Sadusk > > On Wed, 2008-05-07 at 15:48 -0400, DANKOGAI via RT wrote:
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=28822 > > > > > According to our records, your request has been resolved. If you
have any Show quoted text
> > further questions or concerns, please respond to this message.
>