decode_entities() fail to decode NULL-byte, while encode_entities()
support encoding NULL-byte:
$ perl -MHTML::Entities -e '
$s=" < \x00 > ";
$e=encode_entities($s);
$d=decode_entities($e);
printf "e=[%s] d=[%s]\n", $e, $d;
'
e=[ < � > ] d=[ < � > ]
Not sure is this intended behavior, and if yes - what's the reason and
why it doesn't documented. Here is patch which fix this issue:
--- util.c.orig 2010-11-14 02:45:47.000000000 +0200
+++ util.c 2010-11-14 02:45:51.000000000 +0200
@@ -126,7 +126,7 @@
ok = 1;
}
}
- if (num && ok) {
+ if (ok) {
#ifdef UNICODE_HTML_PARSER
if (!SvUTF8(sv) && num <= 255) {
buf[0] = (char) num;
--- t/uentities.t.orig 2010-11-14 02:47:33.000000000 +0200
+++ t/uentities.t 2010-11-14 02:47:58.000000000 +0200
@@ -30,10 +30,10 @@
is(decode_entities("�"), "�");
is(decode_entities("�"), "�");
-is(decode_entities("�"), "�");
-is(decode_entities("�"), "�");
-is(decode_entities("�"), "�");
-is(decode_entities("�"), "�");
+is(decode_entities("�"), "\x0");
+is(decode_entities("�"), "\x0");
+is(decode_entities("�"), "\x0");
+is(decode_entities("�"), "\x0");
is(decode_entities("&#ååå࿿"), "&#��x{FFF}");