Subject: | unescapeHTML method falsely recognizes certain text as entities |
CGI.pm's unescapeHTML method, intended to translate certain SGML
entities into the characters they represent, will recognize anything,
even including whitespace, between an ampersand and a semicolon as an
entity. This results in the ampersand and semicolon being stripped.
For instance, the string:
Dewey, Cheatham & Howe are liars; they stole my life savings!
would be improperly unescaped to read:
Dewey, Cheatham Howe are liars they stole my life savings!
The fix is provided by a small change to the regex, converting it from a
minimal match between ampersand and semicolon, to a minimal match of
non-whitespace between ampersand and semicolon. A patch for this fix is
included.
-pete gamache
tack-weld my last name to google's mail server.
Subject: | cgi.patch |
Common subdirectories: CGI.pm-3.42/CGI and CGI.pm-3.42-patched/CGI
diff -c CGI.pm-3.42/CGI.pm CGI.pm-3.42-patched/CGI.pm
*** CGI.pm-3.42/CGI.pm Mon Sep 8 10:13:23 2008
--- CGI.pm-3.42-patched/CGI.pm Tue Sep 9 11:10:51 2008
***************
*** 2235,2241 ****
my $latin = defined $self->{'.charset'} ? $self->{'.charset'} =~ /^(ISO-8859-1|WINDOWS-1252)$/i
: 1;
# thanks to Randal Schwartz for the correct solution to this one
! $string=~ s[&(.*?);]{
local $_ = $1;
/^amp$/i ? "&" :
/^quot$/i ? '"' :
--- 2235,2241 ----
my $latin = defined $self->{'.charset'} ? $self->{'.charset'} =~ /^(ISO-8859-1|WINDOWS-1252)$/i
: 1;
# thanks to Randal Schwartz for the correct solution to this one
! $string=~ s[&(\W*?);]{
local $_ = $1;
/^amp$/i ? "&" :
/^quot$/i ? '"' :
Common subdirectories: CGI.pm-3.42/examples and CGI.pm-3.42-patched/examples
Common subdirectories: CGI.pm-3.42/t and CGI.pm-3.42-patched/t