Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 88188
Status: rejected
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: sderose [...] me.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Inconsistent keys in the entity2char hash
Date: Tue, 27 Aug 2013 12:50:16 -0400
To: bug-HTML-Parser [...] rt.cpan.org
From: Steven DeRose <sderose [...] me.com>
In HTML::Entities: The keys in %entity2char are inconsistent about whether they include a trailing semicolon. They should be consistent, and should *not* have a semicolon. Characters <= 255 do *not* have a semicolon (this is correct from an HTML/XML/SGML point of view). Characters > 255 *do* have a semicolon (these entries are put in the hash by a separate block of code). Note: The POD documents the availability of the hash, and it's a likely thing for people to want to use. However, the doc does not say whether to access the hash via the pure name, or by the entire entity reference string (such as "&aelig;"). It should, but fixing the code does not, in this case, have any effect on the current doc.
This is intentional.  In the documentation for HTML::Entities you find this passage:

           The keys in %entity2char are the entity names to be expanded and
           their values are what they should expand into.  The values do not
           have to be single character strings.  If a key has ";" as suffix,
           then occurrences in $string are only expanded if properly
           terminated with ";".  Entities without ";" will be expanded
           regardless of how they are terminated for compatibility with how
           common browsers treat entities in the Latin-1 range.