Bug #93563 for Sereal-Decoder: recursive state snafu

Thu Mar 06 08:10:08 2014 zefram [...] fysh.org - Ticket created

Subject:	recursive state snafu
Date:	Thu, 6 Mar 2014 13:09:51 +0000
To:	bug-Sereal-Decoder [...] rt.cpan.org
From:	Zefram <zefram [...] fysh.org>

Sereal::Decoder->decode isn't reentrant, for calls on a single decoder object. This can be seen by recursively decoding using the same decoder object from a THAW callback, with data involving a backreference: use Sereal::Encoder; use Sereal::Decoder; my $dec = Sereal::Decoder->new; package Foo { sub FREEZE { Sereal::Encoder->new->encode({%{$_[0]}}) } sub THAW { bless($dec->decode($_[2]), $_[0]) } } my $z = [ bless({a=>1},"Foo") ]; push @$z, $z; my $a = Sereal::Encoder->new({freeze_callbacks=>1})->encode($z); my $b = $dec->decode($a); (Never mind that this FREEZE/THAW pair is somewhat silly. This is just an example.) With the decoder thus shared, the backreference tracking data gets clobbered, and a SEGV results. If the two ->decode calls use separate decoder objects, by duplicating the "my $dec" line just above "my $b", it all works. The underlying bug is that the srl_decoder_t structure doesn't distinguish between data items that are logically a permanent part of the decoder object (the configuration) and those that are ephemeral, being part of a particular decoding operation (buffered input, current recursion depth, seen-object table). The clean solution is for the decoding object to contain only the configuration, and to instantiate the per-operation variables once per operation, in a separate structure allocated on the C stack. As for Sereal::Encoder, you might possibly have some advantage in avoiding reallocating the variables for every call. Probably much less advantage than S:E gets, though, because the decoder doesn't allocate its own buffer. If there really is advantage in keeping the variables between decoding operations, you need to detect reentrancy yourself and allocate separate variables specifically in the reentrant case. If you do this, there's still no real reason to tie the variables to the decoder structure: you can allocate the base variables in a totally static way (or per Perl thread), and detect reentrancy regardless of which decoding objects are being used. -zefram

Thu Mar 06 13:13:05 2014 smueller [...] cpan.org - Correspondence added

Confirmed. The decoder doesn't have the same reentrance-detection & lazy-cloning logic as the encoder. Having it jump through those hoops would be one way to address this problem (as far as I can tell so far). Won't get much further with this today, though.

Thu Mar 06 13:13:06 2014 The RT System itself - Status changed from 'new' to 'open'

Thu Mar 06 13:36:16 2014 smueller [...] cpan.org - Correspondence added

I just implemented cloning behaviour for the decoder symmetric to the one in the encoder. This appears to fix this bug. The main change is only in a branch right now until it's received review from at least Yves: smueller/reentrant-decoder I'll put off releasing encoder and decoder distributions until this branch is merged (or another/better fix is found) since releasing one of them is 80% of the work of releasing the both of them. Thanks again for reporting the issue and sorry for causing work for you.

Fri Mar 07 09:57:52 2014 smueller [...] cpan.org - Correspondence added

Just a release is pending. Will do as I churn through the other outstanding discussions in related RT tickets.

Fri Mar 07 09:57:53 2014 smueller [...] cpan.org - Status changed from 'open' to 'resolved'

Bug #93563 for Sereal-Decoder: recursive state snafu

Preferred bug tracker