Mark,
Show quoted text> A suggestion for how unescapeHTML should work differently would be
> welcome. Looking at other modules that do HTML escaping/unescaping
> could give you inspiration.
HTML::Entities, probably the most widely used Module for
escaping/unescaping, uses a list of named entities and their
corresponding code points. Unrecognized entities are left alone.
As I understand the function unescapeHTML, it knows 2 named entities,
which it handles as intended, and numeric entities, which it replaces
with their correspoding chr(). IMO a useful fix would be to keep this
behavior, but leave everything else alone, without stripping the "&" and
";" as it currently does in these cases.
However, and I am aware that this might probably open a can of worms, I
was surprised that ->redirect() alters the supplied URL at all. This was
not the behavior I expected. I do not know why this is done, maybe
RFC-compliance or some other reason?
I am not asking for changing this behavior, because it would break
backwards compatibility, but I think that
a) it should be documented and
b) there should be a way to turn it off and use the URL "as is" - maybe
with a flag in the method call, or a different method.
This for both the HTML- und entity-escaping.
Even if the reason is RFC-compliance, there are so many companies around
that use and require messy query strings, that strictly following the
RFC means trouble. For instance, I ran into this problem with URLs from
a big german affiliate agency. I could work around that in my case, but
that was pure chance.
Karlheinz