Bug #75026 for URI: Percent character not escaped

Wed Feb 15 09:17:15 2012 bits [...] itools.com - Ticket created

Subject:

Percent character not escaped

Hi Gisle, I was expecting a "%" character not followed by /[0-9a-fA-F]{2}/ to be percent-encoded, per RFC 2396 2.4.2: Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. However, the percent "%" character passes through URI unescaped: use URI; use URI::Escape; my $unescaped = 'http://example.org/10%_of_nothing'; my $expected_uri = 'http://example.org/10%25_of_nothing'; my @uris = ( URI->new($unescaped), URI->new_abs('10%_of_nothing', 'http://example.org/'), URI->new_abs(uri_escape('10%_of_nothing'), 'http://example.org/') ); for (@uris) { my $canonical_uri = $_->canonical; print $canonical_uri, ($canonical_uri eq $expected_uri ? ' is ' : " isn't "), $expected_uri, "\n"; } # http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing # http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing # http://example.org/10%25_of_nothing is http://example.org/10%25_of_nothing Curiously, use Regexp::Common qw( URI ); # gives $RE{URI}{HTTP} print $unescaped, ($RE{URI}{HTTP}->matches($unescaped) ? ' matches ' : " doesn't match "), "Regexp::Common::URI::http\n"; # http://example.org/10%_of_nothing matches Regexp::Common::URI::http are these both bugs or have I misinterpreted the spec?

Wed Feb 15 09:20:42 2012 bits [...] itools.com - Correspondence added

From:

bits [...] itools.com

The web RT interface seems to mangle the report, download looks ok.

Sat Apr 21 16:36:07 2012 bits [...] itools.com - Correspondence added

From:

bits [...] itools.com

Gisle, Could you please weigh in on whether URI->canonical() would need to process strings containing percent characters not followed by /[0-9a-fA-F] {2}/ by escaping them to %25 to form a valid RFC 2393 URI? If canonical() isn't meant to accept unescaped strings and produce a conformant URI, perhaps the docs could indicate that strings should be parsed into its components, appropriately uri_escaped and re-assembled into a URI string before passing to canonical? Thanks for shedding some light on this.

Sun May 13 07:54:03 2012 GAAS [...] cpan.org - Correspondence added

It's deliberate that URI does not modify % followed by something that isn't a hex number. As I remember it this was based on some wording somewhere (in the old days) that said that the sequence % not followed by a hex number was reserved for future extensions. By passing these sequences through unchanged URI would be compatible with this potential future.

I'm not able to locate this wording anywhere now.

This means that I'm fine with changing URI's behaviour in this regard.

Sun May 13 07:54:05 2012 The RT System itself - Status changed from 'new' to 'open'