Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 70161
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: ruz [...] bestpractical.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: URI parsing may corrupt data if argument is UTF-8 string
Date: Tue, 9 Aug 2011 14:57:01 +0400
To: bug-URI [...] rt.cpan.org
From: Ruslan Zakirov <ruz [...] bestpractical.com>
Hello Gisle, Do you consider the following as a bug or as thing requiring an explanation in the docs? use Encode; use URI; use Devel::Peek; my $uri = URI->new(decode_utf8 '?Query=%C3%A4%C3%B6%C3%BC'); Dump( ($uri->query_form('Query'))[1] ); If drop me ideas on how you want this addressed then I can write a patch. -- Best regards, Ruslan.
It might be considered an issue that the internal UTF8-flag set on the string that initialized the URI gets propagated to the values returned by query_form(). In an ideal world this should not change the semantics of the return value; but currently this has issues. For instance decode_utf8() will not decode such values. I fixed that issue in <https://github.com/gisle/uri/commit/8803283ed9d1b67c7f58d2b5d507ede2602c477a>. After this patch your query_form() call will return a byte string. In general it's more problematic that the UTF8 flag determine how chars in the 128 .. 255 range are percent encoded by URI. Don't really have a good (and backwards-compatible) plan for addressing this.
Subject: Re: [rt.cpan.org #70161] URI parsing may corrupt data if argument is UTF-8 string
Date: Mon, 15 Aug 2011 00:10:30 +0400
To: bug-URI [...] rt.cpan.org
From: Ruslan Zakirov <ruz [...] bestpractical.com>
On Sun, Aug 14, 2011 at 9:56 PM, Gisle_Aas via RT <bug-URI@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=70161 > > > It might be considered an issue that the internal UTF8-flag set on the string that initialized the URI gets propagated to the values returned > by query_form().  In an ideal world this should not change the semantics of the return value; but currently this has issues.  For instance > decode_utf8() will not decode such values. > > I fixed that issue in <https://github.com/gisle/uri/commit/8803283ed9d1b67c7f58d2b5d507ede2602c477a>.  After this patch your > query_form() call will return a byte string.
I expected different reaction. Thanks for implementing this change. Bytes are good in this case. Escaped data may be in any encoding. Show quoted text
> In general it's more problematic that the UTF8 flag determine how chars in the 128 .. 255 range are percent encoded by URI.  Don't really > have a good (and backwards-compatible) plan for addressing this.
Understood. -- Best regards, Ruslan.