Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 44523
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: jquelin [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: files containing NULL byte reported as UTF-LE by Encode::Guess
attached file contains "foo<null>bar" where <null> is the null byte (ctrl+v ctrl+0 in vim, or ctrl+q in emacs) this file is detected as UTF-16LE by Encode::Guess, as demonstrated by snippet: $ perl -MEncode::Guess -E '$a=qx{cat null}; say guess_encoding($a,"ascii")->name;' UTF-16LE and of course, using this detected encoding to decode the file yields very strange results: $ perl -MEncode -E '$a=qx{cat null}; $b=decode("UTF-16LE",$a); say $b' Wide character in print at -e line 1. 潦o慢ੲ happens with Encode 2.32, providing Encode::Guess 2.03
Subject: null
Download null
application/octet-stream 8b

Message body not shown because it is not plain text.

On Tue Mar 24 12:00:26 2009, JQUELIN wrote: Show quoted text
> attached file contains "foo<null>bar" where <null> is the null byte > (ctrl+v ctrl+0 in vim, or ctrl+q in emacs) > > this file is detected as UTF-16LE by Encode::Guess, as demonstrated by > snippet: > > $ perl -MEncode::Guess -E '$a=qx{cat null}; say > guess_encoding($a,"ascii")->name;' > UTF-16LE > > and of course, using this detected encoding to decode the file yields > very strange results: > $ perl -MEncode -E '$a=qx{cat null}; $b=decode("UTF-16LE",$a); say $b' > Wide character in print at -e line 1. > 潦o慢ੲ > > happens with Encode 2.32, providing Encode::Guess 2.03
No, that's not a bug. That's what UTF-(16|32)(LE|BE) is all about. i.e \x20\x00 is VALID and it means \x{0020}. Dan the Maintainer Thereof
i understand that the sequence is valid utf-16. what i'm objecting is that it's not the best guess in this case... what should i do to have a correct guess?
On Tue Mar 24 13:28:34 2009, JQUELIN wrote: Show quoted text
> i understand that the sequence is valid utf-16. what i'm objecting is > that it's not the best guess in this case... > > what should i do to have a correct guess?
Of course it is not the best. After all it is guessing and so long as it appears vaild, it returns the only valid guess. Read perldoc Encode::Guess one more time. Dan the Encode Maintainer