Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 26520
Status: resolved
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: ku0522a [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: URI::_query::query does not escape UTF-8 characters correctly
URI::_query::query replaces UTF-8 multibyte characters with empty string. because that $URI::Escape::escapes does not contain characters which is ord($1) > 255 and returns empty string. to avoid this, add "use bytes" and force to treat $q as ascii characters.
Subject: patch
Download patch
application/octet-stream 372b

Message body not shown because it is not plain text.

From: SREZIC [...] cpan.org
On Fri Apr 20 04:37:36 2007, KUMA wrote: Show quoted text
> URI::_query::query replaces UTF-8 multibyte characters with empty > string. because that $URI::Escape::escapes does not contain characters > which is ord($1) > 255 and returns empty string. > > to avoid this, add "use bytes" and force to treat $q as ascii characters.
I don't think this is the right approach. I rather think that there should be a warning if characters > codepoint 255 are used and that it's up to the user to do the encoding before. There's no standard that high-codepoint characters should be encoded as utf-8. This is similar to perl's IO handling: if the output stream has no encoding attached, then there will be a "wide character" warning. The user is responsible to encode the data beforehand or mark the encoding of the output stream. Regards, Slaven
I basically agree that we should not really assume UTF-8 for unicode, but people just seem to expect that. For now I've applied Gerard's patch from RT#15294 and then we'll see if that works as people expect.