Subject: | downgrade breaks unwritable strings |
Date: | Sat, 15 Mar 2014 21:28:14 +0000 |
To: | bug-Sereal-Decoder [...] rt.cpan.org |
From: | Zefram <zefram [...] fysh.org> |
Sereal::Decoder understandably downgrades its input, if given an
upgraded string. It does so by mutating the input scalar in place,
which is generally rude (it's not exactly input if it gets modified),
and breaks some specific kinds of scalar. The side effect on the input
parameter can be observed thus:
$ perl -MDevel::Peek -MSereal::Encoder -MSereal::Decoder -lwe '$a=Sereal::Encoder->new->encode({}); utf8::upgrade($a); Dump $a; Sereal::Decoder->new->decode($a); Dump $a'
SV = PV(0x1d80e40) at 0x1d9e4e8
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x1d978d0 "=srl\2\0P"\0 [UTF8 "=srl\x{2}\x{0}P"]
CUR = 7
LEN = 16
SV = PV(0x1d80e40) at 0x1d9e4e8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x1d978d0 "=srl\2\0P"\0
CUR = 7
LEN = 16
In this basic case it leaves the scalar still basically correct, and
the problem is merely that it's surprising to have this semi-visible
side effect. It's more of a problem if the input scalar is SvREADONLY.
In that case, it's arguably not correct to downgrade the scalar in place,
and particularly to downgrade within the scalar's PV buffer. Something
might rely on the SvREADONLY preventing the buffer content being modified.
In the extreme case, the input scalar might not own the memory to which
its PV slot points. (Normally the PV points to a separately-allocated
buffer that can be reallocated, and will be freed when the SV dies, but
if SvLEN == 0 then the PV does not point to such a buffer.) It could
point into a memory-mapped file, for example, and downgrading within that
memory, which sv_utf8_downgrade() would do, would be entirely incorrect.
Instead of downgrading an upgraded input in place, S:D should make
a mortal downgraded copy. It must at least do this when the input
scalar is SvREADONLY, but I think it should do this regardless of the
read-only flag.
-zefram