Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 96941
Status: resolved
Priority: 0/
Queue: URI

People
Owner: ether [...] cpan.org
Requestors: ether [...] cpan.org
Cc: IKEGAMI [...] cpan.org
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: 1.64



Subject: Usage of \C now gives deprecation warning in 5.21.2 and higher
URI::Escape uses \C to parse the string to be escaped and should be fixed. Given how this breaks UTF-8 string parsing, I'm wondering if its usage has always been buggy; perhaps we need more tests for UTF-8.
On Fri Jul 04 14:16:04 2014, ETHER wrote: Show quoted text
> URI::Escape uses \C to parse the string to be escaped and should be > fixed. > > Given how this breaks UTF-8 string parsing, I'm wondering if its usage > has always been buggy; perhaps we need more tests for UTF-8.
\C is only used by an undocumented function that's not used by the module. The function is indeed buggy as it can encode identical strings differently. $ perl -MURI::Escape -wE'$_="\xE9"; utf8::downgrade($_); say URI::Escape::escape_char($_);' %E9 $ perl -MURI::Escape -wE'$_="\xE9"; utf8::upgrade($_); say URI::Escape::escape_char($_);' %C3%A9 The following version of escape_char behaves identically, but doesn't use \C: # XXX escape_char is buggy as it assigns meaning to the string's storage format. sub escape_char { if (utf8::is_utf8($_[0])) { my $s = $_[0]; utf8::encode($s); unshift(@_, $s); } return join '', @URI::Escape::escapes{$_[0] =~ /(.)/g}; }
On Sat Jul 05 16:16:19 2014, ikegami wrote: Show quoted text
> \C is only used by an undocumented function that's not used by the > module. > > The function is indeed buggy as it can encode identical strings > differently.
It is used by URI->new, though. URI->new is buggy :( $ perl -MURI -wE'$_="http://foo/\xE9"; utf8::upgrade($_); say URI->new($_);' http://foo/%C3%A9 $ perl -MURI -wE'$_="http://foo/\xE9"; utf8::downgrade($_); say URI->new($_);' http://foo/%E9
On Sat Jul 05 16:16:19 2014, ikegami wrote: Show quoted text
> The following version of escape_char behaves identically, but doesn't > use \C: > > # XXX escape_char is buggy as it assigns meaning to the string's > storage format. > sub escape_char { > if (utf8::is_utf8($_[0])) { > my $s = $_[0]; > utf8::encode($s); > unshift(@_, $s); > } > > return join '', @URI::Escape::escapes{$_[0] =~ /(.)/g}; > }
That should be /(.)/sg.
Thanks, 1.62 released!
Unfortunately this fix causes other tests to fail on perls earlier than 5.12.1: https://rt.cpan.org/Ticket/Display.html?id=97177 I'm not sure if this is a resolvable situation, but regardless, the way in which strings are parsed in this module need to be reconsidered with respect to the utf8 flags on the string.
CC: IKEGAMI [...] cpan.org
Subject: Re: [rt.cpan.org #96941] Usage of \C now gives deprecation warning in 5.21.2 and higher
Date: Sun, 13 Jul 2014 00:30:44 -0400
To: bug-URI [...] rt.cpan.org
From: Eric Brine <ikegami [...] adaelis.com>
I'll have a look at it tomorrow. On Sun, Jul 13, 2014 at 12:05 AM, Karen Etheridge via RT < bug-URI@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=96941 > > > Unfortunately this fix causes other tests to fail on perls earlier than > 5.12.1: https://rt.cpan.org/Ticket/Display.html?id=97177 > > I'm not sure if this is a resolvable situation, but regardless, the way in > which strings are parsed in this module need to be reconsidered with > respect to the utf8 flags on the string. >
CC: ether [...] cpan.org, IKEGAMI [...] cpan.org
Subject: Re: [rt.cpan.org #96941] Usage of \C now gives deprecation warning in 5.21.2 and higher
Date: Sun, 13 Jul 2014 03:10:24 -0400
To: bug-URI [...] rt.cpan.org
From: Eric Brine <ikegami [...] adaelis.com>
On Sun, Jul 13, 2014 at 12:05 AM, Karen Etheridge via RT < bug-URI@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=96941 > > > Unfortunately this fix causes other tests to fail on perls earlier than > 5.12.1: https://rt.cpan.org/Ticket/Display.html?id=97177
Your encountering a bug in utf8::is_utf8 that occurs when escape_char($1) is performed. Fix follows: (Tested with 5.10.1) # XXX FIXME escape_char is buggy as it assigns meaning to the string's storage format. sub escape_char { # Old versions of utf8::is_utf8() didn't properly handle magical vars (e.g. $1). # The following forces a fetch to occur beforehand. my $dummy = substr($_[0], 0, 0); if (utf8::is_utf8($_[0])) { my $s = shift; utf8::encode($s); unshift(@_, $s); } return join '', @URI::Escape::escapes{$_[0] =~ /(.)/sg}; } Note that split // is faster than /./sg, so you might want to use: (Tested with 5.10.1) return join '', @URI::Escape::escapes{split //, $_[0]}; One of your test files uses subtest() from Test::More. Note that subtest() wasn't in Test::More 0.92, the Test::More that comes with 5.10.1. You might want to bump the prereq version of Test::More to 0.94. - Eric
CC: bug-URI [...] rt.cpan.org, IKEGAMI [...] cpan.org
Subject: Re: [rt.cpan.org #96941] Usage of \C now gives deprecation warning in 5.21.2 and higher
Date: Sun, 13 Jul 2014 14:18:00 -0700
To: Eric Brine <ikegami [...] adaelis.com>
From: Karen Etheridge <ether [...] cpan.org>
On Sun, Jul 13, 2014 at 03:10:24AM -0400, Eric Brine wrote: Show quoted text
> Your encountering a bug in utf8::is_utf8 that occurs when escape_char($1) > is performed. Fix follows: (Tested with 5.10.1)
All released in 1.64. thanks!
CC: bug-URI [...] rt.cpan.org, IKEGAMI [...] cpan.org
Subject: Re: [rt.cpan.org #96941] Usage of \C now gives deprecation warning in 5.21.2 and higher
Date: Sun, 13 Jul 2014 14:19:06 -0700
To: Eric Brine <ikegami [...] adaelis.com>
From: Karen Etheridge <ether [...] cpan.org>
(resent, due to mailer issue) On Sun, Jul 13, 2014 at 03:10:24AM -0400, Eric Brine wrote: Show quoted text
> Your encountering a bug in utf8::is_utf8 that occurs when escape_char($1) > is performed. Fix follows: (Tested with 5.10.1) > # Old versions of utf8::is_utf8() didn't properly handle magical vars (e.g. $1). > # The following forces a fetch to occur beforehand. > my $dummy = substr($_[0], 0, 0);
Interesting; can you explain what's going on here and why adding this line helps? Show quoted text
> if (utf8::is_utf8($_[0])) { > my $s = shift;
This change looks like we were mistakenly adding an extra argument to @_ previously... Show quoted text
> Note that split // is faster than /./sg, so you might want to use: (Tested > with 5.10.1)
I was considering that earlier (wondering why we were even matching on \C or . in the first place), but on its own it didn't help anything. But, agreed that it's faster. Show quoted text
> One of your test files uses subtest() from Test::More. Note that subtest() > wasn't in Test::More 0.92, the Test::More that comes with 5.10.1. You might > want to bump the prereq version of Test::More to 0.94.
Thanks, well spotted!