Subject: | Encode Should Accept Noncharacters as Unicode |
Date: | Sat, 19 Jul 2014 22:03:59 -0700 |
To: | bug-Encode [...] rt.cpan.org |
From: | "David E. Wheeler" <dwheeler [...] cpan.org> |
I think this is a bug:
perl -MEncode -E 'say Encode::decode("UTF-8", "\xEF\xBF\xBF", Encode::FB_CROAK)'
utf8 "\xFFFF" does not map to Unicode at /usr/local/lib/perl5/site_perl/5.20.0/darwin-thread-multi-2level/Encode.pm line 175.
\xFFFF is, in fact, a part of UTF-8. It is one of a family of “Noncharacters”, and, according to [Corrigendum 9](http://www.unicode.org/versions/corrigendum9.html), reserved noncharacters now are allowed to appear in UTF-8 strings.
Related discussions:
http://grokbase.com/t/perl/perl5-porters/147gfvrd2n/encode-vs-json
https://rt.perl.org/Public/Bug/Display.html?id=121937.
Thanks,
David
Message body not shown because it is not plain text.