Subject: | If no BOM is found, the routine dies. |
Date: | Fri, 11 Sep 2015 22:41:42 +0200 |
To: | bug-Encode [...] rt.cpan.org |
From: | Damian Lukowski <damian.lukowski [...] credativ.de> |
Hello,
the Encode::Unicode documentation states the following:
Show quoted text
> When BE or LE is omitted during decode(), it checks if BOM is at the beginning of the string; if one is found, the endianness is set to what the BOM says. If no BOM is found, the routine dies.
What is the justification for dying? The Unicode Standard Version 8.0
and RFC2781 define what to do with UTF-16 with no BOM.
Unicode Standard excerpt:
Show quoted text> The UTF-16 encoding scheme may or may not begin with a BOM. However,
> when there is no BOM, and in the absence of a higher-level protocol, the byte
> order of the UTF-16 encoding scheme is big-endian.
RFC2781:
Show quoted text> If the first two octets of the text is not 0xFE followed by
> 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
> interpreted as being big-endian.
Regards
Damian