Bug #26520 for URI: URI::_query::query does not escape UTF-8 characters correctly

Fri Apr 20 04:37:36 2007 ku0522a [...] gmail.com - Ticket created

Subject:

URI::_query::query does not escape UTF-8 characters correctly

URI::_query::query replaces UTF-8 multibyte characters with empty string. because that $URI::Escape::escapes does not contain characters which is ord($1) > 255 and returns empty string. to avoid this, add "use bytes" and force to treat $q as ascii characters.

Subject:

patch

Download patch
application/octet-stream 372b

Message body not shown because it is not plain text.

Thu May 10 07:38:48 2007 SREZIC [...] cpan.org - Correspondence added

From:

SREZIC [...] cpan.org

On Fri Apr 20 04:37:36 2007, KUMA wrote: Show quoted text

> URI::_query::query replaces UTF-8 multibyte characters with empty > string. because that $URI::Escape::escapes does not contain characters > which is ord($1) > 255 and returns empty string. > > to avoid this, add "use bytes" and force to treat $q as ascii characters.

I don't think this is the right approach. I rather think that there should be a warning if characters > codepoint 255 are used and that it's up to the user to do the encoding before. There's no standard that high-codepoint characters should be encoded as utf-8. This is similar to perl's IO handling: if the output stream has no encoding attached, then there will be a "wide character" warning. The user is responsible to encode the data beforehand or mark the encoding of the output stream. Regards, Slaven

Thu May 10 07:38:52 2007 The RT System itself - Status changed from 'new' to 'open'

Wed Apr 02 18:17:19 2008 GAAS [...] cpan.org - Correspondence added

I basically agree that we should not really assume UTF-8 for unicode, but people just seem to expect that. For now I've applied Gerard's patch from RT#15294 and then we'll see if that works as people expect.

Wed Apr 02 18:17:21 2008 GAAS [...] cpan.org - Status changed from 'open' to 'resolved'