On Tue Mar 24 12:00:26 2009, JQUELIN wrote:
Show quoted text> attached file contains "foo<null>bar" where <null> is the null byte
> (ctrl+v ctrl+0 in vim, or ctrl+q in emacs)
>
> this file is detected as UTF-16LE by Encode::Guess, as demonstrated by
> snippet:
>
> $ perl -MEncode::Guess -E '$a=qx{cat null}; say
> guess_encoding($a,"ascii")->name;'
> UTF-16LE
>
> and of course, using this detected encoding to decode the file yields
> very strange results:
> $ perl -MEncode -E '$a=qx{cat null}; $b=decode("UTF-16LE",$a); say $b'
> Wide character in print at -e line 1.
> 潦o慢ੲ
>
> happens with Encode 2.32, providing Encode::Guess 2.03
No, that's not a bug. That's what UTF-(16|32)(LE|BE) is all about. i.e \x20\x00 is VALID and
it means \x{0020}.
Dan the Maintainer Thereof