Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Sereal-Decoder CPAN distribution.

Report information
The Basics
Id: 93892
Status: open
Priority: 0/
Queue: Sereal-Decoder

People
Owner: Nobody in particular
Requestors: zefram [...] fysh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: downgrade breaks unwritable strings
Date: Sat, 15 Mar 2014 21:28:14 +0000
To: bug-Sereal-Decoder [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
Sereal::Decoder understandably downgrades its input, if given an upgraded string. It does so by mutating the input scalar in place, which is generally rude (it's not exactly input if it gets modified), and breaks some specific kinds of scalar. The side effect on the input parameter can be observed thus: $ perl -MDevel::Peek -MSereal::Encoder -MSereal::Decoder -lwe '$a=Sereal::Encoder->new->encode({}); utf8::upgrade($a); Dump $a; Sereal::Decoder->new->decode($a); Dump $a' SV = PV(0x1d80e40) at 0x1d9e4e8 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x1d978d0 "=srl\2\0P"\0 [UTF8 "=srl\x{2}\x{0}P"] CUR = 7 LEN = 16 SV = PV(0x1d80e40) at 0x1d9e4e8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x1d978d0 "=srl\2\0P"\0 CUR = 7 LEN = 16 In this basic case it leaves the scalar still basically correct, and the problem is merely that it's surprising to have this semi-visible side effect. It's more of a problem if the input scalar is SvREADONLY. In that case, it's arguably not correct to downgrade the scalar in place, and particularly to downgrade within the scalar's PV buffer. Something might rely on the SvREADONLY preventing the buffer content being modified. In the extreme case, the input scalar might not own the memory to which its PV slot points. (Normally the PV points to a separately-allocated buffer that can be reallocated, and will be freed when the SV dies, but if SvLEN == 0 then the PV does not point to such a buffer.) It could point into a memory-mapped file, for example, and downgrading within that memory, which sv_utf8_downgrade() would do, would be entirely incorrect. Instead of downgrading an upgraded input in place, S:D should make a mortal downgraded copy. It must at least do this when the input scalar is SvREADONLY, but I think it should do this regardless of the read-only flag. -zefram
Subject: Re: [rt.cpan.org #93892] downgrade breaks unwritable strings
Date: Sun, 16 Mar 2014 11:38:39 +0100
To: bug-Sereal-Decoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 15 March 2014 22:28, Zefram via RT <bug-Sereal-Decoder@rt.cpan.org> wrote: Show quoted text
> Sat Mar 15 17:28:26 2014: Request 93892 was acted upon. > Transaction: Ticket created by zefram@fysh.org > Queue: Sereal-Decoder > Subject: downgrade breaks unwritable strings > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: zefram@fysh.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93892 > > > > Sereal::Decoder understandably downgrades its input, if given an > upgraded string. It does so by mutating the input scalar in place, > which is generally rude (it's not exactly input if it gets modified), > and breaks some specific kinds of scalar. The side effect on the input > parameter can be observed thus: > > $ perl -MDevel::Peek -MSereal::Encoder -MSereal::Decoder -lwe '$a=Sereal::Encoder->new->encode({}); utf8::upgrade($a); Dump $a; Sereal::Decoder->new->decode($a); Dump $a' > SV = PV(0x1d80e40) at 0x1d9e4e8 > REFCNT = 1 > FLAGS = (POK,pPOK,UTF8) > PV = 0x1d978d0 "=srl\2\0P"\0 [UTF8 "=srl\x{2}\x{0}P"] > CUR = 7 > LEN = 16 > SV = PV(0x1d80e40) at 0x1d9e4e8 > REFCNT = 1 > FLAGS = (POK,pPOK) > PV = 0x1d978d0 "=srl\2\0P"\0 > CUR = 7 > LEN = 16 > > In this basic case it leaves the scalar still basically correct, and > the problem is merely that it's surprising to have this semi-visible > side effect. It's more of a problem if the input scalar is SvREADONLY. > In that case, it's arguably not correct to downgrade the scalar in place, > and particularly to downgrade within the scalar's PV buffer. Something > might rely on the SvREADONLY preventing the buffer content being modified. > > In the extreme case, the input scalar might not own the memory to which > its PV slot points. (Normally the PV points to a separately-allocated > buffer that can be reallocated, and will be freed when the SV dies, but > if SvLEN == 0 then the PV does not point to such a buffer.) It could > point into a memory-mapped file, for example, and downgrading within that > memory, which sv_utf8_downgrade() would do, would be entirely incorrect. > > Instead of downgrading an upgraded input in place, S:D should make > a mortal downgraded copy. It must at least do this when the input > scalar is SvREADONLY, but I think it should do this regardless of the > read-only flag.
I just pushed this to the custom_op branch: commit a605c1d019a5cdd28a93c01c0f3d9f3d935b07f7 Author: Yves Orton <yves.orton@booking.com> Date: Sun Mar 16 11:36:20 2014 +0100 [rt.cpan.org #93892] downgrade breaks unwritable strings Arguably we should not downgrade the original string. This also removes code duplication by moving this logic into srl_begin_decoding() right before we copy the the string buffer into our private struct. I wonder if we should warn when this happens. This change means that in the case of a large buffer holding many sereal packets which is in utf8 will be decoded for each buffer. I guess people "should just not do that" but I wonder if we should warn if we do... Please review, it does the sv_mortalcopy always when it must upgrade. I am inclined to make it warn when this happens. What do you think? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [rt.cpan.org #93892] downgrade breaks unwritable strings
Date: Sun, 16 Mar 2014 11:38:39 +0100
To: bug-Sereal-Decoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 15 March 2014 22:28, Zefram via RT <bug-Sereal-Decoder@rt.cpan.org> wrote: Show quoted text
> Sat Mar 15 17:28:26 2014: Request 93892 was acted upon. > Transaction: Ticket created by zefram@fysh.org > Queue: Sereal-Decoder > Subject: downgrade breaks unwritable strings > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: zefram@fysh.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93892 > > > > Sereal::Decoder understandably downgrades its input, if given an > upgraded string. It does so by mutating the input scalar in place, > which is generally rude (it's not exactly input if it gets modified), > and breaks some specific kinds of scalar. The side effect on the input > parameter can be observed thus: > > $ perl -MDevel::Peek -MSereal::Encoder -MSereal::Decoder -lwe '$a=Sereal::Encoder->new->encode({}); utf8::upgrade($a); Dump $a; Sereal::Decoder->new->decode($a); Dump $a' > SV = PV(0x1d80e40) at 0x1d9e4e8 > REFCNT = 1 > FLAGS = (POK,pPOK,UTF8) > PV = 0x1d978d0 "=srl\2\0P"\0 [UTF8 "=srl\x{2}\x{0}P"] > CUR = 7 > LEN = 16 > SV = PV(0x1d80e40) at 0x1d9e4e8 > REFCNT = 1 > FLAGS = (POK,pPOK) > PV = 0x1d978d0 "=srl\2\0P"\0 > CUR = 7 > LEN = 16 > > In this basic case it leaves the scalar still basically correct, and > the problem is merely that it's surprising to have this semi-visible > side effect. It's more of a problem if the input scalar is SvREADONLY. > In that case, it's arguably not correct to downgrade the scalar in place, > and particularly to downgrade within the scalar's PV buffer. Something > might rely on the SvREADONLY preventing the buffer content being modified. > > In the extreme case, the input scalar might not own the memory to which > its PV slot points. (Normally the PV points to a separately-allocated > buffer that can be reallocated, and will be freed when the SV dies, but > if SvLEN == 0 then the PV does not point to such a buffer.) It could > point into a memory-mapped file, for example, and downgrading within that > memory, which sv_utf8_downgrade() would do, would be entirely incorrect. > > Instead of downgrading an upgraded input in place, S:D should make > a mortal downgraded copy. It must at least do this when the input > scalar is SvREADONLY, but I think it should do this regardless of the > read-only flag.
I just pushed this to the custom_op branch: commit a605c1d019a5cdd28a93c01c0f3d9f3d935b07f7 Author: Yves Orton <yves.orton@booking.com> Date: Sun Mar 16 11:36:20 2014 +0100 [rt.cpan.org #93892] downgrade breaks unwritable strings Arguably we should not downgrade the original string. This also removes code duplication by moving this logic into srl_begin_decoding() right before we copy the the string buffer into our private struct. I wonder if we should warn when this happens. This change means that in the case of a large buffer holding many sereal packets which is in utf8 will be decoded for each buffer. I guess people "should just not do that" but I wonder if we should warn if we do... Please review, it does the sv_mortalcopy always when it must upgrade. I am inclined to make it warn when this happens. What do you think? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [rt.cpan.org #93892] downgrade breaks unwritable strings
Date: Mon, 17 Mar 2014 11:01:05 +0000
To: demerphq via RT <bug-Sereal-Decoder [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
demerphq via RT wrote: Show quoted text
>Please review, it does the sv_mortalcopy always when it must upgrade.
The sv_chop logic at the bottom of srl_decode_into_internal is now broken in the upgraded case. With the chop option on, you should probably still downgrade the input in place. (Can do it by making "src=sv_mortalcopy(src)" conditional on the option.) It's OK to do that in this case, as the option means you're explicitly meant to modify the input. The other possibility is to determine the upgraded length of what you've consumed and chop that much, which seems more trouble than it's worth. Otherwise OK. Show quoted text
>I am inclined to make it warn when this happens. What do you think?
It should not warn. There is nothing incorrect about an octet string being represented in upgraded form. -zefram