Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 62708
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: john [...] rimmkaufman.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Patch for URI to handle urls
Date: Thu, 4 Nov 2010 16:45:22 -0400
To: <bug-URI [...] rt.cpan.org>
From: "John Miller" <john [...] rimmkaufman.com>
Version: 1.38 Perl Version: 5.8.8 x86_64-linux OS Version: Gentoo Linux. Kernel 2.6.28-hardened The _query module does not handle http query strings that are not <name>=<value> pairs. Code to reproduce: #!/usr/bin/perl use strict; use warnings FATAL => 'all'; use URI; use URI::QueryParam; my $u = URI->new('http://sample.com/test.html?foo'); $u->query_param_append('bar', 1); print $u->as_string, "\n"; $u = URI->new('http://sample.com/test.html?foo&bar=1'); $u->query_form($u->query_form); print $u->as_string, "\n"; Expected output: http://sample.com/test.html?foo&bar=1 http://sample.com/test.html?foo&bar=1 Actual output: http://sample.com/test.html?bar=1 http://sample.com/test.html?foo=&bar=1 Note that in the first string, the foo part of the query portion is dropped. In the second, an = is appended to the element. With the attached patch, the output is: http://sample.com/test.html?foo&bar=1 http://sample.com/test.html?foo&bar=1 As expected. There is a behavior change. Previously, parameters sent into query_form with value undef would be set to blank. Now they will be included with no "=" or value. To set a value to blank, use an empty string -- ''.

Message body is not shown because sender requested not to inline it.

I'm not convinced this is a bug. query_form() parses query strings according to the application/x-www-form-urlencoded format, which as best as I can tell is defined here: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 '?foo' doesn't use that format -- it's more properly parsed using query_keywords. '?foo&bar=1' is undefined behavior, since 'foo' is not a proper name/value pair. In both cases, it's reasonable for query_form to ignore invalid data. It's also reasonable to try to get as much data as possible by pulling out 'bar=1', and possibly by treating the bare 'foo' as 'foo='. It does not seem reasonable to me to pretend that '?foo&bar=1' is distinct from both '?bar=1' and '?foo=&bar=1' -- the spec makes absolutely no provision for this kind of pseudo-undef value.
Since this patch turned so controversal I've now reverted it and uploaded URI-1.58 without it.
Subject: RE: [rt.cpan.org #62708] Patch for URI to handle urls
Date: Sat, 22 Jan 2011 19:28:22 -0500
To: <bug-URI [...] rt.cpan.org>
From: "John Miller" <john [...] rimmkaufman.com>
The query string "foo=&bar=1" is not the same as "foo&bar=1". There is no requirement in RFC1738 or RFC3986 for the query portion to have any particular format. As mentioned in RFC3986, they 'are often used to carry identifying information in the form of "key=value" pairs' but there is no requirement to do so. More importantly, many web applications do not follow the key=value form. So it would be nice for this module to be able to handle those more general query strings. If the query_form function is intended to parse only application/x-www-form-urlencoded query strings, it should refuse to parse (i.e. die on) ones that do not match that format. But it should not alter them to a form that is not usable. This patch allows the module to reversibly parse and reconstruct such general query strings. An alternate would be to add a new function which is able to parse general query strings rather than just application/x-www-form-urlencoded strings or + separated keywords. This seemed easier. Thanks for your consideration. Show quoted text
-----Original Message----- From: Hans Dieter Pearcey via RT [mailto:bug-URI@rt.cpan.org] Sent: Saturday, January 22, 2011 4:13 PM To: john@rimmkaufman.com Subject: [rt.cpan.org #62708] Patch for URI to handle urls <URL: https://rt.cpan.org/Ticket/Display.html?id=62708 > I'm not convinced this is a bug. query_form() parses query strings according to the application/x-www-form-urlencoded format, which as best as I can tell is defined here: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 '?foo' doesn't use that format -- it's more properly parsed using query_keywords. '?foo&bar=1' is undefined behavior, since 'foo' is not a proper name/value pair. In both cases, it's reasonable for query_form to ignore invalid data. It's also reasonable to try to get as much data as possible by pulling out 'bar=1', and possibly by treating the bare 'foo' as 'foo='. It does not seem reasonable to me to pretend that '?foo&bar=1' is distinct from both '?bar=1' and '?foo=&bar=1' -- the spec makes absolutely no provision for this kind of pseudo-undef value.
On Sat Jan 22 20:28:35 2011, john@rimmkaufman.com wrote: Show quoted text
> The query string "foo=&bar=1" is not the same as "foo&bar=1".
Yes, I agree. The former is properly-formatted application/x-www-form-urlencoded data, the latter is not. However, if you're going to try a quirksmode-esque interpretation of "foo&bar=1" as x-www-form-urlencoded, "foo=&bar=1" is the closest thing you're going to get. If you treat them differently, you've just made a Perl-specific extension to the specification. This is usually bad. Show quoted text
> There is no requirement in RFC1738 or RFC3986 for the query portion to
have Show quoted text
> any particular format. As mentioned in RFC3986, they 'are often used to > carry identifying information in the form of "key=value" pairs' but
there is Show quoted text
> no requirement to do so. More importantly, many web applications do not > follow the key=value form. So it would be nice for this module to be
able to Show quoted text
> handle those more general query strings.
Have you possibly overlooked $uri->query, which gets and sets the entire query string? If not, then I'm at a loss to understand what you mean by "those more general query strings". If you don't want to treat the query string as a single opaque string, then you need to know how to parse it, which means having a specification. As far as I can tell, you want to change URI to use sort of a "x-www-form-urlencoded plus some extra stuff"; you're free to do that in your own code, but I can't see any reason why it should be added to the URI module. Show quoted text
> If the query_form function is intended to parse only application/x- > www-form-urlencoded query strings, it should refuse to parse (i.e.
die on) Show quoted text
> ones that do not match that format. But it should not alter them to a
form Show quoted text
> that is not usable.
Its documentation makes it clear that it is only intended for application/x-www-form-urlencoded query strings. If the documentation claims to do X when given valid input Y, and you give it Z instead, the primary fault is yours when the code doesn't do what you wanted to do to. I'm sympathetic to the argument that it should have originally died if asked to parse a query string that had invalid x-www-form-urlencoded data, but I think that ship has sailed a long time ago, and it would be a major backwards compatibility breakage to start throwing such an exception now.
Subject: RE: [rt.cpan.org #62708] Patch for URI to handle urls
Date: Wed, 26 Jan 2011 11:07:11 -0500
To: <bug-URI [...] rt.cpan.org>
From: "John Miller" <john [...] rimmkaufman.com>
Show quoted text
> Have you possibly overlooked $uri->query, which gets and sets the entire query string?
Yeah I did. The module handles the uris we deal with 95% of the time. A few of our clients choose to use non-standard query string formatting and I didn't want to have to write a custom parser for the 5%. This tweak allowed it to pass those 5% through without breaking them. One other suggestion for your consideration: The bug as reported is not only in query_form. It is also in the way URI::QueryParam::query_param_append/as_string alters the uri to a form that the server may not accept. To be nitpicky, neither query_param_append nor as_string are explicitly documented as only accepting application/x-www-form-urlencoded data (although query_param is noted as an alternate interface to query_form so it might be implied). Is there another way to change that behavior so it does not drop the "invalid" portion or otherwise alter it destructively? If not, I am happy to continue to run with the patch locally since it suits our needs. I have no objection to closing the ticket as wontfix or notabug as you see fit. You might explicitly note in the docs that the behavior on non-app/x-www-form-urlencoded data is undefined and may alter it in ways that render it unusable. Possibly with a link to the definition. While I knew they were non-traditional, I had no idea this spec was that restrictive (which may be an RTFM moment for me, but sometimes I get busy). I can't call them non-standard because we are seeing more of them as time passes.
The query_form() is already trying to be more practical than strict application/x-www-form- urlencoded would prescribe by supporting ";" as alternative delimiter. I think it would be good to support value-less entries without '=' because I know my favourite web framework accept that and I use that myself. If there was a backwards compatible way to do it I would go for it.