Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 20864
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: scop [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.53
Fixed in: (no value)



Subject: Whitespace issue with external id and internal subset
Cf. http://www.w3.org/mid/eavf10.3tg.1%40mail.christoph.schneegans.de Christoph Schneegans reported that the W3C Markup Validator fails to find a doctype from a declaration where an internal subset immediately (ie. no whitespace) follows an external ID: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"[]> Indeed, the validator uses HTML::Parser for finding the doctype, but the parser never reports the declaration event with the above. The issue seems to be triggered by lack of whitespace between the last " and [; with the following, the declaration event does happen as expected: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" []> Per the XML recommendation, whitespace is not required between external ID and internal subset, see http://www.w3.org/TR/REC-xml/#NT-doctypedecl
Fixed in my sources now. To appear in HTML-Parser-3.56. The applied patch is: --- hparser.c 10 Jul 2006 09:00:47 -0000 2.133 +++ hparser.c 12 Jan 2007 10:53:09 -0000 @@ -1159,8 +1159,7 @@ parse_decl(PSTATE* p_state, char *beg, c /* first word available */ PUSH_TOKEN(decl_id, s); - while (s < end && isHSPACE(*s)) { - s++; + while (1) { while (s < end && isHSPACE(*s)) s++;