Subject: | UTF-8 decodes illegal (non)character U+FFFE |
No input should cause the UTF-8 decoder to produce illegal characters,
any such should be replaced with U+FFFD.
The attached script generates the output and warning:
fffe
Unicode character 0xfffe is illegal at utf8-nonchar.pl line 11.
It should instead produce:
fffd
and no warning.
Subject: | utf8-nonchar.pl |
use Encode;
use strict;
use warnings;
my $text = "aaa\xef\xbf\xbebbb";
my $utf = Encode::decode('UTF-8', $text, 0);
printf "%x\n", ord(substr($utf, 3, 1));
$utf =~ /\b(?:https?|ftp)/o;