On Sun Mar 14 17:10:23 2010, GAAS wrote:
Show quoted text> Yupp. It's a trap, but I don't know any good way to fix it without
> breaking something.
>
> The gist of the problem is that I allowed URI to end up making a
> semantic distinction between
> strings with and without the (internal) UTF8 string set.
> Demonstration:
>
> $ perl -MURI -le '@a = ("f\xE5r") x 2; utf8::upgrade($a[0]); print
> URI->new($_) for @a'
> f%C3%A5r
> f%E5r
>
> If you combine strings where some are upgraded and some are not, then
> we get a silent
problem that URI module combines strings. not the user. so this problem should be fixed in URI.
===
use strict;
use warnings;
use Data::Dumper;
use URI;
my $content = { username => "\xC2", password => "\xC2" };
utf8::upgrade($content->{username});
my $temp_uri = URI->new('http:');
$temp_uri->query_form($content);
print Dumper $temp_uri->query ;
===
$VAR1 = 'password=%C3%82&username=%C3%82';
There is documentation in bugs section
https://metacpan.org/pod/URI#BUGS
describing that there is the Unicode Bug, indeed. That's ok.
But there is not behaviour describing that URI combines strings somewhere inside. That's probably a bug and should be fixed (not sure about backcompat)
Show quoted text> upgrade of those that are not. The problem is that it's really hard
> to keep track of when you
> get one or the other (and programmers should really not have to care).
>
> In your example for instance it's surprising that this fails:
>
> {
> my $uri = URI->new('http:');
> $uri->query_form(key => encode_utf8("mooi\x{20AC}e"));
> is $uri->query, "key=mooi%E2%82%ACe";
> }
>
> The reason is that "key" is passed an an upgraded string even if it's
> plain ASCII. The "=>"
> operator insist on upgrading the left operand when the utf8 pragma is
> in effect. If you quote
only in perl 5.10 and earlier.
Show quoted text> it as "key" it's not upgraded and the test will pass. This is not
> sane behaviour.