Skip Menu |

This queue is for tickets about the Catalyst-Runtime CPAN distribution.

Report information
The Basics
Id: 94957
Status: resolved
Priority: 0/
Queue: Catalyst-Runtime

People
Owner: Nobody in particular
Requestors: TEEJAY [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 5.90040
  • 5.90041
  • 5.90042
  • 5.90049_001
  • 5.90049_002
  • 5.90049_003
  • 5.90049_004
  • 5.90049_005
  • 5.90049_006
  • 5.90050
  • 5.90051
  • 5.90052
  • 5.90053
  • 5.90059_001
  • 5.90059_002
  • 5.90059_003
  • 5.90059_004
  • 5.90059_005
  • 5.90059_006
  • 5.90060
  • 5.90061
  • 5.90062
Fixed in: (no value)



Subject: unicode double encoding caused by Plugin:Unicode::Encoding
Plugin::Unicode::Encoding doesn't check for already encoded data before encoding it again : from line 70: # Encode expects plain scalars (IV, NV or PV) and segfaults on ref's $c->response->body( $c->encoding->encode( $body, $CHECK ) ) if ref(\$body) eq 'SCALAR'; Naively only checks it's a scalar, not that it isn't already encoded, which will be the case if the body is JSON with unicode Fix is : # Encode expects plain scalars (IV, NV or PV) and segfaults on ref's if (ref(\$body) eq 'SCALAR' && utf8::is_utf8($body)){ $c->response->body( $c->encoding->encode( $body, $CHECK ) ); } In our case we have code in a controller which does : $c->res->body(to_json($json, { utf8 => 1, pretty => defined $c->stash->{pretty_json}?$c->stash->{pretty_json}:1, allow_blessed => 1, convert_blessed => 1, })); This will always double encoded utf8 output without the patch shown
Show quoted text
> Fix is : > # Encode expects plain scalars (IV, NV or PV) and segfaults on ref's > if (ref(\$body) eq > 'SCALAR' && utf8::is_utf8($body)){ > $c->response->body( $c-
> >encoding->encode( $body, $CHECK ) );
> }
That fix is courtesy of gbjk (Gareth Kirwan) who spotted and patched it at our work, I'm just upstreaming it while he's busy
That fix is wrong. One should not use utf8::is_utf8 to detect if data is encoded or no, it would be a bug ( http://perldoc.perl.org/perlunifaq.html#How-can-I-determine-if-a-string-is-a-text-string-or-a-binary-string? ). Actually it's impossible to detect if data is encoded or no. On Wed Apr 23 14:38:17 2014, TEEJAY wrote: Show quoted text
> Plugin::Unicode::Encoding doesn't check for already encoded data > before encoding it again : > > from line 70: > # Encode expects plain scalars (IV, NV or PV) and segfaults on > ref's > $c->response->body( $c->encoding->encode( $body, $CHECK ) ) > if ref(\$body) eq 'SCALAR'; > > Naively only checks it's a scalar, not that it isn't already encoded, > which will be the case if the body is JSON with unicode > > Fix is : > # Encode expects plain scalars (IV, NV or PV) and segfaults on ref's > if (ref(\$body) eq > 'SCALAR' && utf8::is_utf8($body)){ > $c->response->body( $c-
> >encoding->encode( $body, $CHECK ) );
> } > > In our case we have code in a controller which does : > > $c->res->body(to_json($json, { > utf8 => 1, > pretty => defined $c->stash->{pretty_json}?$c->stash-
> >{pretty_json}:1,
> allow_blessed => 1, > convert_blessed => 1, > })); > > This will always double encoded utf8 output without the patch shown
I see fix applied already in Catalyst-Plugin-Unicode I propose reverting it, as it introduce a bug. Latin1 data in utf8::downgraded form will be detected as encoded data. On Thu Jul 17 13:21:17 2014, vsespb wrote: Show quoted text
> That fix is wrong. One should not use utf8::is_utf8 to detect if data > is encoded or no, it would be a bug ( > http://perldoc.perl.org/perlunifaq.html#How-can-I-determine-if-a- > string-is-a-text-string-or-a-binary-string? ). Actually it's > impossible to detect if data is encoded or no. > > On Wed Apr 23 14:38:17 2014, TEEJAY wrote:
> > Plugin::Unicode::Encoding doesn't check for already encoded data > > before encoding it again : > > > > from line 70: > > # Encode expects plain scalars (IV, NV or PV) and segfaults on > > ref's > > $c->response->body( $c->encoding->encode( $body, $CHECK ) ) > > if ref(\$body) eq 'SCALAR'; > > > > Naively only checks it's a scalar, not that it isn't already encoded, > > which will be the case if the body is JSON with unicode > > > > Fix is : > > # Encode expects plain scalars (IV, NV or PV) and segfaults on > > ref's > > if (ref(\$body) eq > > 'SCALAR' && utf8::is_utf8($body)){ > > $c->response->body( $c-
> > > encoding->encode( $body, $CHECK ) );
> > } > > > > In our case we have code in a controller which does : > > > > $c->res->body(to_json($json, { > > utf8 => 1, > > pretty => defined $c->stash->{pretty_json}?$c->stash-
> > > {pretty_json}:1,
> > allow_blessed => 1, > > convert_blessed => 1, > > })); > > > > This will always double encoded utf8 output without the patch shown
On 2014-07-17 02:36:36, vsespb wrote: Show quoted text
> I see fix applied already in Catalyst-Plugin-Unicode > I propose reverting it, as it introduce a bug. > Latin1 data in utf8::downgraded form will be detected as encoded data.
It looks like there are several places, all over the Catalyst-Runtime codebase, that perform is_utf8 checks :( As Victor says, this does NOT do what the author intended, and must be fixed.
On Thu Jul 17 11:40:54 2014, ETHER wrote: Show quoted text
> On 2014-07-17 02:36:36, vsespb wrote:
> > I see fix applied already in Catalyst-Plugin-Unicode > > I propose reverting it, as it introduce a bug. > > Latin1 data in utf8::downgraded form will be detected as encoded > > data.
> > It looks like there are several places, all over the Catalyst-Runtime > codebase, that perform is_utf8 checks :( As Victor says, this does > NOT do what the author intended, and must be fixed.
The current 5.90079__8 dev releases removes all that stuff and I think tries to clarify and improve how encoding is done. Also we are going to make utf8 encoding default going forward so I hope that shakes out any issues. I;m going to close this, but if you find issue with the latest set of dev releases please shout out.