Subject: | recursive state snafu |
Date: | Thu, 6 Mar 2014 13:09:51 +0000 |
To: | bug-Sereal-Decoder [...] rt.cpan.org |
From: | Zefram <zefram [...] fysh.org> |
Sereal::Decoder->decode isn't reentrant, for calls on a single decoder
object. This can be seen by recursively decoding using the same decoder
object from a THAW callback, with data involving a backreference:
use Sereal::Encoder;
use Sereal::Decoder;
my $dec = Sereal::Decoder->new;
package Foo {
sub FREEZE { Sereal::Encoder->new->encode({%{$_[0]}}) }
sub THAW { bless($dec->decode($_[2]), $_[0]) }
}
my $z = [ bless({a=>1},"Foo") ];
push @$z, $z;
my $a = Sereal::Encoder->new({freeze_callbacks=>1})->encode($z);
my $b = $dec->decode($a);
(Never mind that this FREEZE/THAW pair is somewhat silly. This is just
an example.) With the decoder thus shared, the backreference tracking
data gets clobbered, and a SEGV results. If the two ->decode calls use
separate decoder objects, by duplicating the "my $dec" line just above
"my $b", it all works.
The underlying bug is that the srl_decoder_t structure doesn't distinguish
between data items that are logically a permanent part of the decoder
object (the configuration) and those that are ephemeral, being part of a
particular decoding operation (buffered input, current recursion depth,
seen-object table). The clean solution is for the decoding object to
contain only the configuration, and to instantiate the per-operation
variables once per operation, in a separate structure allocated on the
C stack.
As for Sereal::Encoder, you might possibly have some advantage in
avoiding reallocating the variables for every call. Probably much less
advantage than S:E gets, though, because the decoder doesn't allocate
its own buffer. If there really is advantage in keeping the variables
between decoding operations, you need to detect reentrancy yourself and
allocate separate variables specifically in the reentrant case. If you
do this, there's still no real reason to tie the variables to the decoder
structure: you can allocate the base variables in a totally static way
(or per Perl thread), and detect reentrancy regardless of which decoding
objects are being used.
-zefram