Subject: | Percent character not escaped |
Hi Gisle,
I was expecting a "%" character not followed by /[0-9a-fA-F]{2}/ to be
percent-encoded, per RFC 2396 2.4.2:
Because the percent "%" character always has the reserved purpose of
being the escape indicator, it must be escaped as "%25" in order to
be used as data within a URI.
However, the percent "%" character passes through URI unescaped:
use URI;
use URI::Escape;
my $unescaped = 'http://example.org/10%_of_nothing';
my $expected_uri = 'http://example.org/10%25_of_nothing';
my @uris = (
URI->new($unescaped),
URI->new_abs('10%_of_nothing', 'http://example.org/'),
URI->new_abs(uri_escape('10%_of_nothing'), 'http://example.org/')
);
for (@uris) {
my $canonical_uri = $_->canonical;
print $canonical_uri, ($canonical_uri eq $expected_uri ? ' is ' : " isn't "), $expected_uri, "\n";
}
# http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing
# http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing
# http://example.org/10%25_of_nothing is http://example.org/10%25_of_nothing
Curiously,
use Regexp::Common qw( URI ); # gives $RE{URI}{HTTP}
print $unescaped, ($RE{URI}{HTTP}->matches($unescaped) ? ' matches ' : " doesn't match "),
"Regexp::Common::URI::http\n";
# http://example.org/10%_of_nothing matches Regexp::Common::URI::http
are these both bugs or have I misinterpreted the spec?