Skip Menu |

This queue is for tickets about the HTTP-Parser CPAN distribution.

Report information
The Basics
Id: 66298
Status: resolved
Worked: 15 min
Priority: 0/
Queue: HTTP-Parser

People
Owner: david [...] edeca.net
Requestors: frodwith [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 0.01
  • 0.02
  • 0.03
  • 0.04
  • 0.05
Fixed in: 0.06



Subject: $request->uri is wrong for paths that start with //
If the Request-URI starts with two slashes (e.g. GET //foo/bar), the returned request's uri object is misconfigured (thinking that this is a scheme-relative uri) and the first segment after the double slash (foo in this case) is interpreted as the host. The attached patch fixes the issue.
Subject: abs_path_slashes.patch
diff -ur old/Parser.pm new/Parser.pm --- old/Parser.pm 2010-06-15 13:57:43.000000000 -0500 +++ new/Parser.pm 2011-03-01 13:53:02.000000000 -0600 @@ -260,6 +260,12 @@ unless $http and $http =~ /^HTTP\/(\d+)\.(\d+)$/i; ($major,$minor) = ($1,$2); die 'HTTP requests not allowed' unless $self->{request}; + + # If the Request-URI is an abs_path, we need to tell URI that we don't + # know the scheme, otherwise it will misinterpret paths that start with + # // as being scheme-relative uris, and will interpret the first + # component after // as the host (see rfc 2616) + $uri = "//$uri" if $uri =~ m(^/); $obj = $self->{obj} = HTTP::Request->new($method, URI->new($uri)); } diff -ur old/t/1.t new/t/1.t --- old/t/1.t 2007-02-24 08:54:48.000000000 -0600 +++ new/t/1.t 2011-03-01 13:34:34.000000000 -0600 @@ -4,7 +4,7 @@ ######################### use strict; -use Test::More tests => 21; +use Test::More tests => 22; # <1> BEGIN { use_ok('HTTP::Parser') }; @@ -77,3 +77,7 @@ is($res->content, "Some content!\x0d\x0a", 'content is correct'); } +# <1> +$parser = HTTP::Parser->new(request => 1); +$parser->add("GET //foo///bar/baz HTTP/1.1\x0d\x0a\x0d\x0a"); +is $parser->request->uri->path, '//foo///bar/baz';
On Tue Mar 01 14:56:12 2011, frodwith@gmail.com wrote: Show quoted text
> If the Request-URI starts with two slashes (e.g. GET //foo/bar), the > returned request's uri object > is misconfigured (thinking that this is a scheme-relative uri) and the > first segment after the > double slash (foo in this case) is interpreted as the host. The > attached patch fixes the issue.
Is this actually a valid request URI? It is a perfectly valid URI for use in HTML pages etc (scheme relative, as you say) but I am unsure if it is actually a valid HTTP request. I see no reason not to implement your patch, but am interested whether there is a definitive answer to this question first.
From: frodwith [...] gmail.com
Yes indeed, unless my BNF has completely failed me. I went digging in RFCs, and it seems that any one or zero path segments (which can be empty) separated by slashes makes up the opaque part of the uri. So, ///////////// is a valid path string. Go figure, eh?
On Tue Mar 01 16:59:57 2011, frodwith@gmail.com wrote: Show quoted text
> Yes indeed, unless my BNF has completely failed me. I went digging in > RFCs, and it seems that > any one or zero path segments (which can be empty) separated by > slashes makes up the opaque > part of the uri. So, ///////////// is a valid path string. Go figure, > eh?
Your patch integrated, version bumped to 0.06 and pushed to PAUSE. Should be on CPAN soon. https://github.com/edeca/HTTP-Parser/commit/b779df1a7b87e867d2e19b40a85fe40c6955de8f Thanks very much for the report & patch.