Request/Response:
========
GET /misc/mytest?domains=%E4%EE%EC%E5%ED HTTP/1.1
Host: www1.reg.ru
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: [CUT]
Connection: keep-alive
Cache-Control: max-age=0
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 24 Mar 2015 16:19:45 GMT
Content-Type: text/html; charset=WINDOWS-1251
Transfer-Encoding: chunked
Connection: keep-alive
Content-Language: ru
Set-Cookie: [CUT]
X-Catalyst: 5.90083
x-ua-compatible: IE=edge,chrome=IE8
Content-Encoding: gzip
========
Action code:
========
sub mytest : Local Args(0) {
my ($self, $c, $r, $p) = getcontvars_noses @_;
$c->res->content_type( "text/plain" );
$c->res->body( $p->{domains} );
}
========
Console:
========
[error] Caught exception in engine "Wide character in syswrite at /usr/local/share/perl/5.14.2/Starman/Server.pm line 547."
========
Show quoted text> one thing I wonder exactly what you think this should do?
Well, with encoding=>undef it should do nothing with charsets. i.e. return octets as is. i.e
probably $self->unescape_uri($_) instead of decode_utf8($self->unescape_uri($_))
Show quoted text> I was taking the assumption that people would what Catalyst to convert the encoded chacters to local unicode wide characters but maybe that is not an ideal assumption?
Yes, right. With encoding NOT undef, Catalyst should convert binary data to perl strings (unicode wide characters).
But when encoding IS undef, it should pass binary data as-is. It's exactly what it does with output data.
So with encoding undef input processing should be consistent with output processing.
We work with textual data in WINDOWS-1251 currently. That's pre-unicode approach. We're migrating to unicode,
but we're just not there yet.
On Tue Mar 24 16:59:39 2015, JJNAPIORK wrote:
Show quoted text> Hey
>
> one thing I wonder exactly what you think this should do? I was
> taking the assumption that people would what Catalyst to convert the
> encoded chacters to local unicode wide characters but maybe that is
> not an ideal assumption?
>
> talk to me about the use case and what you'd ideally see here
>
> On Tue Mar 24 06:56:19 2015, vsespb wrote:
> > Hello.
> >
> > Here is the diff:
> >
> >
https://metacpan.org/diff/file?target=JJNAPIORK/Catalyst-Runtime-
> > 5.90083/&source=JJNAPIORK/Catalyst-Runtime-5.90082/
> >
> > There is the line:
> >
> > ===
> > map { defined $_ ? decode_utf8($self->unescape_uri($_)) : $_ }
> > ===
> >
> > It decodes data to Unicode string and assumes that it's in UTF-8, it
> > ignores "encoding" option (even encoding => undef).
> >
> > Before 5.90083 logic was the same:
> >
> > ===
> > map { decode_utf8($self->unescape_uri($_)) }
> > ===
> >
> > However, there was
> > ===
> > - if(my $query_obj = $env->{'plack.request.query'}) {
> > - $c->request->query_parameters(
> > - $c->request->_use_hash_multivalue ?
> > - $query_obj->clone :
> > - $query_obj->as_hashref_mixed);
> > - return;
> > - }
> > -
> > ===
> > before decoding.
> >
> > So, our site didn't reach this decoding, since $env-
> > > {'plack.request.query'} was true.
> >
> > Use case:
> >
> > Site runs under encoding => undef (previously without encoding at
> > all).
> > Web page encoding is WINDOWS-1251, so all incoming data, including
> > query string is WINDOWS-1251 as well.
> >
> > Example of URL:
http://example.com/test?domains=%E4%EE%EC%E5%ED
> >
> > %E4%EE%EC%E5%ED is Russian word in WINDOWS-1251 encoding.
> >
> > So in 5.90082 octets are passed as-is to the application. After
> > 5.90083 - it's decoded to Unicode string consists of not-a-
> > characters.