Subject: | Encode::Guess dies if UTF-16 and iso-8859-1 are both specified |
I attempted to use Encode::Guess::guess_encoding to find the encoding on
files in this set: iso-8859-1, ascii, utf8, utf16
If I try this on a file that is actually iso-8859-1, it will die with
this error:
Error finding encoding: UTF-16:Unrecognised BOM 436f at
/app/clarity/perl/lib/5.8.7/i686-linux-thread-multi/Encode/Guess.pm line
135.
This file doesn't have a BOM, and 436f is "Co", the first two characters
of the file. iso-8859-1 shouldn't require a BOM, so this shouldn't be
an issue. It appears that the problem is when it is trying different
decoders, it doesn't catch errors thrown. UTF-16 ends up first in the
list of things to try, and dies when it can't find a BOM. What appears
to fix the problem is just wrapping the decode line in an eval block.
This patch seems to fix the problem entirely.
diff -r Encode-2.23/lib/Encode/Guess.pm Encode-2.23-js/lib/Encode/Guess.pm
139c139,142
< $try{$k}->decode( $scratch, FB_QUIET );
---
Show quoted text
> eval {
> $try{$k}->decode( $scratch, FB_QUIET );
> };
>