Subject: | Encode::Unicode croaks with malformed data |
I use Encode-2.01 bundled with perl 5.8.5.
Encode::Unicode croaks even if CHECK arg is not Encode::FB_CROAK when conversion fails due to malformed unicode data (ex. invalid surrogate, missing a BOM). I hope invalid characters should be replaced with `substitution character' in Encode::Unicode.
% perl -MEncode -e '$a = "\xfe\xff\xd8\xd9\xda\xdb\0\n"; Encode::from_to($a, "utf16", "shift_jis", 0); print ("$a");'
UTF-16:Malformed LO surrogate d8d9 at /usr/lib/perl5/5.8.5/cygwin-thread-multi-64int/Encode.pm line 184.
% perl -MEncode -e '$a = "BOM missing"; Encode::from_to($a, "utf16", "shift_jis", 0); print ("$a");'
UTF-16:Unrecognised BOM 424f at /usr/lib/perl5/5.8.5/cygwin-thread-multi-64int/Encode.pm line 184.
% perl -v
This is perl, v5.8.5 built for cygwin-thread-multi-64int
Copyright 1987-2004, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
% uname -a
CYGWIN_NT-5.1 blue-water 1.5.11(0.116/4/2) 2004-09-04 23:17 i686 unknown unknown Cygwin