Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 75026
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: bits [...] itools.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.59
Fixed in: (no value)



Subject: Percent character not escaped
Hi Gisle, I was expecting a "%" character not followed by /[0-9a-fA-F]{2}/ to be percent-encoded, per RFC 2396 2.4.2: Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. However, the percent "%" character passes through URI unescaped: use URI; use URI::Escape; my $unescaped = 'http://example.org/10%_of_nothing'; my $expected_uri = 'http://example.org/10%25_of_nothing'; my @uris = ( URI->new($unescaped), URI->new_abs('10%_of_nothing', 'http://example.org/'), URI->new_abs(uri_escape('10%_of_nothing'), 'http://example.org/') ); for (@uris) { my $canonical_uri = $_->canonical; print $canonical_uri, ($canonical_uri eq $expected_uri ? ' is ' : " isn't "), $expected_uri, "\n"; } # http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing # http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing # http://example.org/10%25_of_nothing is http://example.org/10%25_of_nothing Curiously, use Regexp::Common qw( URI ); # gives $RE{URI}{HTTP} print $unescaped, ($RE{URI}{HTTP}->matches($unescaped) ? ' matches ' : " doesn't match "), "Regexp::Common::URI::http\n"; # http://example.org/10%_of_nothing matches Regexp::Common::URI::http are these both bugs or have I misinterpreted the spec?
From: bits [...] itools.com
The web RT interface seems to mangle the report, download looks ok.
From: bits [...] itools.com
Gisle, Could you please weigh in on whether URI->canonical() would need to process strings containing percent characters not followed by /[0-9a-fA-F] {2}/ by escaping them to %25 to form a valid RFC 2393 URI? If canonical() isn't meant to accept unescaped strings and produce a conformant URI, perhaps the docs could indicate that strings should be parsed into its components, appropriately uri_escaped and re-assembled into a URI string before passing to canonical? Thanks for shedding some light on this.
It's deliberate that URI does not modify % followed by something that isn't a hex number.  As I remember it this was based on some wording somewhere (in the old days) that said that the sequence % not followed by a hex number was reserved for future extensions.  By passing these sequences through unchanged URI would be compatible with this potential future.

I'm not able to locate this wording anywhere now.

This means that I'm fine with changing URI's behaviour in this regard.