Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 62973
Status: rejected
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: POWERMAN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.68
Fixed in: (no value)



decode_entities() fail to decode NULL-byte, while encode_entities() support encoding NULL-byte: $ perl -MHTML::Entities -e ' $s=" < \x00 > "; $e=encode_entities($s); $d=decode_entities($e); printf "e=[%s] d=[%s]\n", $e, $d; ' e=[ &lt; &#0; &gt; ] d=[ < &#0; > ] Not sure is this intended behavior, and if yes - what's the reason and why it doesn't documented. Here is patch which fix this issue: --- util.c.orig 2010-11-14 02:45:47.000000000 +0200 +++ util.c 2010-11-14 02:45:51.000000000 +0200 @@ -126,7 +126,7 @@ ok = 1; } } - if (num && ok) { + if (ok) { #ifdef UNICODE_HTML_PARSER if (!SvUTF8(sv) && num <= 255) { buf[0] = (char) num; --- t/uentities.t.orig 2010-11-14 02:47:33.000000000 +0200 +++ t/uentities.t 2010-11-14 02:47:58.000000000 +0200 @@ -30,10 +30,10 @@ is(decode_entities("&#x110000"), "&#x110000"); is(decode_entities("&#XFFFFFFFF"), "&#XFFFFFFFF"); -is(decode_entities("&#0"), "&#0"); -is(decode_entities("&#0;"), "&#0;"); -is(decode_entities("&#x0"), "&#x0"); -is(decode_entities("&#X0;"), "&#X0;"); +is(decode_entities("&#0"), "\x0"); +is(decode_entities("&#0;"), "\x0"); +is(decode_entities("&#x0"), "\x0"); +is(decode_entities("&#X0;"), "\x0"); is(decode_entities("&#&aring&#229&#229;&#xFFF"), "&#��x{FFF}");
It is intentional.  There is no way to represent &#0; in XML (or HTML).