Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 53681
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: hideki.yamamura [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: A bad utf8-related trap on query_form()
Date: Thu, 14 Jan 2010 21:33:20 +0900
To: bug-URI [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Thanks for useful module. It is very helpful in our projects. I found a trap around query_form which many people in Japan caught in. Please refer a attached test script. -- Hideki YAMAMURA <hideki.yamamura@gmail.com>

Message body is not shown because sender requested not to inline it.

Yupp. It's a trap, but I don't know any good way to fix it without breaking something. The gist of the problem is that I allowed URI to end up making a semantic distinction between strings with and without the (internal) UTF8 string set. Demonstration: $ perl -MURI -le '@a = ("f\xE5r") x 2; utf8::upgrade($a[0]); print URI->new($_) for @a' f%C3%A5r f%E5r If you combine strings where some are upgraded and some are not, then we get a silent upgrade of those that are not. The problem is that it's really hard to keep track of when you get one or the other (and programmers should really not have to care). In your example for instance it's surprising that this fails: { my $uri = URI->new('http:'); $uri->query_form(key => encode_utf8("mooi\x{20AC}e")); is $uri->query, "key=mooi%E2%82%ACe"; } The reason is that "key" is passed an an upgraded string even if it's plain ASCII. The "=>" operator insist on upgrading the left operand when the utf8 pragma is in effect. If you quote it as "key" it's not upgraded and the test will pass. This is not sane behaviour.
On Sun Mar 14 17:10:23 2010, GAAS wrote: Show quoted text
> Yupp. It's a trap, but I don't know any good way to fix it without > breaking something. > > The gist of the problem is that I allowed URI to end up making a > semantic distinction between > strings with and without the (internal) UTF8 string set. > Demonstration: > > $ perl -MURI -le '@a = ("f\xE5r") x 2; utf8::upgrade($a[0]); print > URI->new($_) for @a' > f%C3%A5r > f%E5r > > If you combine strings where some are upgraded and some are not, then > we get a silent
problem that URI module combines strings. not the user. so this problem should be fixed in URI. === use strict; use warnings; use Data::Dumper; use URI; my $content = { username => "\xC2", password => "\xC2" }; utf8::upgrade($content->{username}); my $temp_uri = URI->new('http:'); $temp_uri->query_form($content); print Dumper $temp_uri->query ; === $VAR1 = 'password=%C3%82&username=%C3%82'; There is documentation in bugs section https://metacpan.org/pod/URI#BUGS describing that there is the Unicode Bug, indeed. That's ok. But there is not behaviour describing that URI combines strings somewhere inside. That's probably a bug and should be fixed (not sure about backcompat) Show quoted text
> upgrade of those that are not. The problem is that it's really hard > to keep track of when you > get one or the other (and programmers should really not have to care). > > In your example for instance it's surprising that this fails: > > { > my $uri = URI->new('http:'); > $uri->query_form(key => encode_utf8("mooi\x{20AC}e")); > is $uri->query, "key=mooi%E2%82%ACe"; > } > > The reason is that "key" is passed an an upgraded string even if it's > plain ASCII. The "=>" > operator insist on upgrading the left operand when the utf8 pragma is > in effect. If you quote
only in perl 5.10 and earlier. Show quoted text
> it as "key" it's not upgraded and the test will pass. This is not > sane behaviour.