Bug #66298 for HTTP-Parser: $request->uri is wrong for paths that start with //

Tue Mar 01 14:56:12 2011 frodwith [...] gmail.com - Ticket created

Subject:

$request->uri is wrong for paths that start with //

If the Request-URI starts with two slashes (e.g. GET //foo/bar), the returned request's uri object is misconfigured (thinking that this is a scheme-relative uri) and the first segment after the double slash (foo in this case) is interpreted as the host. The attached patch fixes the issue.

Subject:

abs_path_slashes.patch

diff -ur old/Parser.pm new/Parser.pm --- old/Parser.pm 2010-06-15 13:57:43.000000000 -0500 +++ new/Parser.pm 2011-03-01 13:53:02.000000000 -0600 @@ -260,6 +260,12 @@ unless $http and $http =~ /^HTTP\/(\d+)\.(\d+)$/i; ($major,$minor) = ($1,$2); die 'HTTP requests not allowed' unless $self->{request}; + + # If the Request-URI is an abs_path, we need to tell URI that we don't + # know the scheme, otherwise it will misinterpret paths that start with + # // as being scheme-relative uris, and will interpret the first + # component after // as the host (see rfc 2616) + $uri = "//$uri" if $uri =~ m(^/); $obj = $self->{obj} = HTTP::Request->new($method, URI->new($uri)); } diff -ur old/t/1.t new/t/1.t --- old/t/1.t 2007-02-24 08:54:48.000000000 -0600 +++ new/t/1.t 2011-03-01 13:34:34.000000000 -0600 @@ -4,7 +4,7 @@ ######################### use strict; -use Test::More tests => 21; +use Test::More tests => 22; # <1> BEGIN { use_ok('HTTP::Parser') }; @@ -77,3 +77,7 @@ is($res->content, "Some content!\x0d\x0a", 'content is correct'); } +# <1> +$parser = HTTP::Parser->new(request => 1); +$parser->add("GET //foo///bar/baz HTTP/1.1\x0d\x0a\x0d\x0a"); +is $parser->request->uri->path, '//foo///bar/baz';

Tue Mar 01 15:23:53 2011 david [...] edeca.net - Correspondence added

On Tue Mar 01 14:56:12 2011, frodwith@gmail.com wrote: Show quoted text

> If the Request-URI starts with two slashes (e.g. GET //foo/bar), the > returned request's uri object > is misconfigured (thinking that this is a scheme-relative uri) and the > first segment after the > double slash (foo in this case) is interpreted as the host. The > attached patch fixes the issue.

Is this actually a valid request URI? It is a perfectly valid URI for use in HTML pages etc (scheme relative, as you say) but I am unsure if it is actually a valid HTTP request. I see no reason not to implement your patch, but am interested whether there is a definitive answer to this question first.

Tue Mar 01 15:23:53 2011 The RT System itself - Status changed from 'new' to 'open'

Tue Mar 01 16:59:57 2011 frodwith [...] gmail.com - Correspondence added

From:

frodwith [...] gmail.com

Yes indeed, unless my BNF has completely failed me. I went digging in RFCs, and it seems that any one or zero path segments (which can be empty) separated by slashes makes up the opaque part of the uri. So, ///////////// is a valid path string. Go figure, eh?

Sun Mar 06 15:47:17 2011 david [...] edeca.net - Correspondence added 15 min

On Tue Mar 01 16:59:57 2011, frodwith@gmail.com wrote: Show quoted text

> Yes indeed, unless my BNF has completely failed me. I went digging in > RFCs, and it seems that > any one or zero path segments (which can be empty) separated by > slashes makes up the opaque > part of the uri. So, ///////////// is a valid path string. Go figure, > eh?

Your patch integrated, version bumped to 0.06 and pushed to PAUSE. Should be on CPAN soon. https://github.com/edeca/HTTP-Parser/commit/b779df1a7b87e867d2e19b40a85fe40c6955de8f Thanks very much for the report & patch.

Sun Mar 06 15:47:17 2011 david [...] edeca.net - Status changed from 'open' to 'resolved'

Sun Mar 06 15:47:18 2011 david [...] edeca.net - Given to edeca

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Severity Normal added

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Broken in 0.01 added

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Broken in 0.02 added

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Broken in 0.03 added

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Broken in 0.04 added

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Broken in 0.05 added

Mon Mar 07 04:13:55 2011 david [...] edeca.net - Fixed in 0.06 added