Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Sereal-Decoder CPAN distribution.

Report information
The Basics
Id: 81299
Status: resolved
Priority: 0/
Queue: Sereal-Decoder

People
Owner: Nobody in particular
Requestors: DAMS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: decoder fails to un-snappy a utf8 string
Hi, I'm sending to the Sereal decoder a valid encoded string, however its utf8 flag is on. The snappy decompression fails. I'm not sure it's a bug : maybe I'm not supposed to send the decoder such string. Note : it fails at snappy uncompression. Context : I'm encoding a structure, storing it in redis, and reading it back. Redis happens to give me a perl string with utf8 internal flag turned on. Sereal decoding then fails. Workaround : detect if the string has the internal utf8 flag on. If yes, encode the perl string into latin1. It then works fine. So I have a workaround for this, but I thought maybe Sereal decodeer (orthe snappy uncompressor) could check the utf8 flag and do appropriate stuff if needed attached a test case.
Subject: test_sereal.pl
#!/usr/bin/env perl use Sereal::Encoder; use Sereal::Decoder; my $encoder = Sereal::Encoder->new({ snappy => 1, snappy_threshold => 0 }); my $decoder = Sereal::Decoder->new(); my $s1 = { foo => 'bar', f111 => 'bar', f1111 => 'bar', f11111 => 'bar', f111111 => 'bar', f1111111 => 'bar', f11111111 => 'bar', f111111111 => 'bar', f1111111111 => 'bar', f11111111111 => 'bar', f111111111111 => 'bar', f1111111111111 => 'bar', f11111111111111 => 'bar', f111111111111111 => 'bar', f1111111111111111 => 'bar', f11111111111111111 => 'bar', }; my $s2 = $encoder->encode($s1); use Encode; $s2 = encode("utf8", $s2); $decoder->decode($s2);
I actually forgot the error message :) Sereal: Error in srl_decoder.c line 265: Snappy decompression of Sereal packet payload failed with error -5! at test_sereal.pl line ...
On Tue Nov 20 09:35:11 2012, DAMS wrote: Show quoted text
> I actually forgot the error message :) > > Sereal: Error in srl_decoder.c line 265: Snappy decompression of Sereal > packet payload failed with error -5! at test_sereal.pl line ... >
Well, the call to encode("utf8", $s2) is going to change the contents of the encoded byte string, so it's not surprising to see it fail. The UTF8 flag by itself does not matter : if you replace that line by Encode::_utf8_on($s2); then the decoder succeeds. (at least on my machine) So I'd say it's not a bug.
On Tue Nov 20 09:49:58 2012, RGARCIA wrote: Show quoted text
> On Tue Nov 20 09:35:11 2012, DAMS wrote:
> > I actually forgot the error message :) > > > > Sereal: Error in srl_decoder.c line 265: Snappy decompression of
> Sereal
> > packet payload failed with error -5! at test_sereal.pl line ... > >
> > Well, the call to encode("utf8", $s2) is going to change the contents > of the encoded byte string, > so it's not surprising to see it fail.
Yes, but from the point of view of the user, it is then possible to have 2 strings : str1 and str2, where str1 eq str2, one being internally stored as utf8 and the other one as latin1. So these 2 strings are "equal" in the Perl sense, but one of them won't decode properly. I don't think it's a good behavior to oblige the user to care about the way its strings are internally stored, to be able to use decode. downgrading the strings may be a solution. But I'll let you decide on that :) Show quoted text
> > The UTF8 flag by itself does not matter : if you replace that line by > Encode::_utf8_on($s2); > then the decoder succeeds. (at least on my machine) > > So I'd say it's not a bug.
On Tue Nov 20 10:21:51 2012, DAMS wrote: Show quoted text
> On Tue Nov 20 09:49:58 2012, RGARCIA wrote:
> > On Tue Nov 20 09:35:11 2012, DAMS wrote:
> > > I actually forgot the error message :) > > > > > > Sereal: Error in srl_decoder.c line 265: Snappy decompression of
> > Sereal
> > > packet payload failed with error -5! at test_sereal.pl line ... > > >
> > > > Well, the call to encode("utf8", $s2) is going to change the contents > > of the encoded byte string, > > so it's not surprising to see it fail.
> > Yes, but from the point of view of the user, it is then possible to have > 2 strings : str1 and str2, where str1 eq str2, one being internally > stored as utf8 and the other one as latin1. > > So these 2 strings are "equal" in the Perl sense, but one of them won't > decode properly. I don't think it's a good behavior to oblige the user > to care about the way its strings are internally stored, to be able to > use decode. > > downgrading the strings may be a solution.
In your example $s2 after encoding does not get the internal UTF8 flag on. However I agree that it's sensible to downgrade a UTF8 string before feeding it to the sereal decoder. What about the patch below ? apparently it solves your problem (if the $s2 string gets its utf8 flag on.) --- a/Perl/Decoder/srl_decoder.c +++ b/Perl/Decoder/srl_decoder.c @@ -221,6 +221,9 @@ SV * srl_decode_into(pTHX_ srl_decoder_t *dec, SV *src, SV* into, UV start_offset) { assert(dec != NULL); + if (SvUTF8(src)) { + sv_utf8_downgrade(src, 0); + } srl_begin_decoding(aTHX_ dec, src, start_offset); srl_read_header(aTHX_ dec); if (SRL_DEC_HAVE_OPTION(dec, SRL_F_DECODER_DECOMPRESS_SNAPPY)) {
Subject: Re: [rt.cpan.org #81299] decoder fails to un-snappy a utf8 string
Date: Tue, 20 Nov 2012 17:14:49 +0100
To: bug-Sereal-Decoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 20 November 2012 17:08, Rafaël Garcia-Suarez via RT <bug-Sereal-Decoder@rt.cpan.org> wrote: Show quoted text
> Queue: Sereal-Decoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81299 > > > On Tue Nov 20 10:21:51 2012, DAMS wrote:
>> On Tue Nov 20 09:49:58 2012, RGARCIA wrote:
>> > On Tue Nov 20 09:35:11 2012, DAMS wrote:
>> > > I actually forgot the error message :) >> > > >> > > Sereal: Error in srl_decoder.c line 265: Snappy decompression of
>> > Sereal
>> > > packet payload failed with error -5! at test_sereal.pl line ... >> > >
>> > >> > Well, the call to encode("utf8", $s2) is going to change the contents >> > of the encoded byte string, >> > so it's not surprising to see it fail.
>> >> Yes, but from the point of view of the user, it is then possible to have >> 2 strings : str1 and str2, where str1 eq str2, one being internally >> stored as utf8 and the other one as latin1. >> >> So these 2 strings are "equal" in the Perl sense, but one of them won't >> decode properly. I don't think it's a good behavior to oblige the user >> to care about the way its strings are internally stored, to be able to >> use decode. >> >> downgrading the strings may be a solution.
> > In your example $s2 after encoding does not get the internal UTF8 flag on. > However I agree that it's sensible to downgrade a UTF8 string before feeding it to the sereal decoder. > > What about the patch below ? apparently it solves your problem (if the $s2 string gets its utf8 flag on.) > > --- a/Perl/Decoder/srl_decoder.c > +++ b/Perl/Decoder/srl_decoder.c > @@ -221,6 +221,9 @@ SV * > srl_decode_into(pTHX_ srl_decoder_t *dec, SV *src, SV* into, UV start_offset) > { > assert(dec != NULL); > + if (SvUTF8(src)) { > + sv_utf8_downgrade(src, 0); > + } > srl_begin_decoding(aTHX_ dec, src, start_offset); > srl_read_header(aTHX_ dec); > if (SRL_DEC_HAVE_OPTION(dec, SRL_F_DECODER_DECOMPRESS_SNAPPY)) {
What happens if the downgrade fails? Does it die? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
On Tue Nov 20 11:15:02 2012, demerphq@gmail.com wrote: Show quoted text
> > --- a/Perl/Decoder/srl_decoder.c > > +++ b/Perl/Decoder/srl_decoder.c > > @@ -221,6 +221,9 @@ SV * > > srl_decode_into(pTHX_ srl_decoder_t *dec, SV *src, SV* into, UV
> start_offset)
> > { > > assert(dec != NULL); > > + if (SvUTF8(src)) { > > + sv_utf8_downgrade(src, 0); > > + } > > srl_begin_decoding(aTHX_ dec, src, start_offset); > > srl_read_header(aTHX_ dec); > > if (SRL_DEC_HAVE_OPTION(dec, SRL_F_DECODER_DECOMPRESS_SNAPPY))
> { > > What happens if the downgrade fails? Does it die?
Yes (the 2nd parameter to sv_utf8_downgrade, "fail ok", is false). Needs testing though :)
Subject: Re: [rt.cpan.org #81299] decoder fails to un-snappy a utf8 string
Date: Tue, 20 Nov 2012 18:52:36 +0100
To: bug-Sereal-Decoder [...] rt.cpan.org
From: Steffen Mueller <smueller [...] cpan.org>
On 11/20/2012 05:29 PM, Rafaël Garcia-Suarez via RT wrote: Show quoted text
> Queue: Sereal-Decoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81299 > > > On Tue Nov 20 11:15:02 2012, demerphq@gmail.com wrote:
>>> --- a/Perl/Decoder/srl_decoder.c >>> +++ b/Perl/Decoder/srl_decoder.c >>> @@ -221,6 +221,9 @@ SV * >>> srl_decode_into(pTHX_ srl_decoder_t *dec, SV *src, SV* into, UV
>> start_offset)
>>> { >>> assert(dec != NULL); >>> + if (SvUTF8(src)) { >>> + sv_utf8_downgrade(src, 0); >>> + } >>> srl_begin_decoding(aTHX_ dec, src, start_offset); >>> srl_read_header(aTHX_ dec); >>> if (SRL_DEC_HAVE_OPTION(dec, SRL_F_DECODER_DECOMPRESS_SNAPPY))
>> { >> >> What happens if the downgrade fails? Does it die?
> > Yes (the 2nd parameter to sv_utf8_downgrade, "fail ok", is false). > Needs testing though :)
Could you (or Damien) do the testing before we apply the patch, please? I have an allergic reaction when exposed to encoding issues that ruins my entire day every time. --Steffen
Subject: Re: [rt.cpan.org #81299] decoder fails to un-snappy a utf8 string
Date: Tue, 20 Nov 2012 20:17:59 +0100
To: bug-Sereal-Decoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 20 November 2012 18:52, Steffen Mueller via RT <bug-Sereal-Decoder@rt.cpan.org> wrote: Show quoted text
> Queue: Sereal-Decoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81299 > > > On 11/20/2012 05:29 PM, Rafaël Garcia-Suarez via RT wrote:
>> Queue: Sereal-Decoder >> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81299 > >> >> On Tue Nov 20 11:15:02 2012, demerphq@gmail.com wrote:
>>>> --- a/Perl/Decoder/srl_decoder.c >>>> +++ b/Perl/Decoder/srl_decoder.c >>>> @@ -221,6 +221,9 @@ SV * >>>> srl_decode_into(pTHX_ srl_decoder_t *dec, SV *src, SV* into, UV
>>> start_offset)
>>>> { >>>> assert(dec != NULL); >>>> + if (SvUTF8(src)) { >>>> + sv_utf8_downgrade(src, 0); >>>> + } >>>> srl_begin_decoding(aTHX_ dec, src, start_offset); >>>> srl_read_header(aTHX_ dec); >>>> if (SRL_DEC_HAVE_OPTION(dec, SRL_F_DECODER_DECOMPRESS_SNAPPY))
>>> { >>> >>> What happens if the downgrade fails? Does it die?
>> >> Yes (the 2nd parameter to sv_utf8_downgrade, "fail ok", is false). >> Needs testing though :)
> > Could you (or Damien) do the testing before we apply the patch, please? > I have an allergic reaction when exposed to encoding issues that ruins > my entire day every time.
Ah the old unicode rash. :-) Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Sure, i'll do extensive tests today and let you know. Le Mar 20 Nov 2012 12:52:52, SMUELLER a écrit : Show quoted text
> On 11/20/2012 05:29 PM, Rafaël Garcia-Suarez via RT wrote:
> > Queue: Sereal-Decoder > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81299 > > > > > On Tue Nov 20 11:15:02 2012, demerphq@gmail.com wrote:
> >>> --- a/Perl/Decoder/srl_decoder.c > >>> +++ b/Perl/Decoder/srl_decoder.c > >>> @@ -221,6 +221,9 @@ SV * > >>> srl_decode_into(pTHX_ srl_decoder_t *dec, SV *src, SV* into, UV
> >> start_offset)
> >>> { > >>> assert(dec != NULL); > >>> + if (SvUTF8(src)) { > >>> + sv_utf8_downgrade(src, 0); > >>> + } > >>> srl_begin_decoding(aTHX_ dec, src, start_offset); > >>> srl_read_header(aTHX_ dec); > >>> if (SRL_DEC_HAVE_OPTION(dec,
SRL_F_DECODER_DECOMPRESS_SNAPPY)) Show quoted text
> >> { > >> > >> What happens if the downgrade fails? Does it die?
> > > > Yes (the 2nd parameter to sv_utf8_downgrade, "fail ok", is false). > > Needs testing though :)
> > Could you (or Damien) do the testing before we apply the patch, please? > I have an allergic reaction when exposed to encoding issues that ruins > my entire day every time. > > --Steffen
I pushed a fix to the master on github https://github.com/Sereal/Sereal/commit/36daac22081d332cdada37b0ccda2329055e71ee That contains Dams' original test case plus another mode of failure I found.
I've tested on my side, and the patches solve the issues. Thanks a lot! I could set this bug as resolved but I don't know what your RT process is, maybe you want to wait for a release to happen before. So I switched it to patched :)
Fixed in Sereal::Decoder 0.19.