Subject: | Backslash not escaped in the uri escaping (pstrutils.inc urlencode_pstring) |
In pstrutils.inc in urlencode_pstring, it has a comment:
* # do the translation (RFC 2396 ^uric)
* s!([^a-zA-Z0-9_.\-])!sprintf('%%%02X', $_)
The code then has the test case:
if ((curchar>='a' && curchar<='z') ||
(curchar>='A' && curchar<='Z') ||
(curchar>='0' && curchar<='9') ||
curchar=='_' || curchar=='.' || curchar=='\\' || curchar=='-'
)
*(buf+offset)=curchar;
...
The backslash ('\\') is not in that character class in the regex. The backslash in the regex is, unnecessarily, escaping the dash ("-") following it.
FWIW, I found this while comparing URI::Escape, URI::Escape::XS, HTML::Template's ESCAPE="url", and HTML::Template::Pro's ESCAPE="url".
* URI::Escape::uri_escape : does not handle multi-byte characters (because it's incorrectly treating them as characters instead of bytes).
* URI::Escape::uri_escape_utf8 : incorrectly handles single high byte characters (because it utf8::upgrade's the string... so it will require utf8::downgrade on the other side, and it can not handle arbitrary data, like JPEG byte strings).
* URI::Escape::XS::uri_escape : does NOT encode [~!*'()], while all other methods do. Most of those were removed from the unreserved character set with rfc3986.
* HTML::Template : makes the same mistakes as URI::Escape::uri_escape, but silently drops multibyte chars from its output ("test\xE5test" becomes "testtest").
* HTML::Template::Pro : handles everything correctly, except the backslash.
Fix is to just remove the "|| curchar=='\\' " from the if statement.