Bug #93560 for Sereal-Encoder: recursive buffer snafu

Thu Mar 06 06:37:15 2014 zefram [...] fysh.org - Ticket created

Subject:	recursive buffer snafu
Date:	Thu, 6 Mar 2014 11:35:47 +0000
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	Zefram <zefram [...] fysh.org>

Sereal::Encoder->encode isn't reentrant, for calls on a single encoder object. This can be seen by recursively encoding using the same encoder object from a FREEZE callback: use Sereal::Encoder; use Sereal::Decoder; my $enc = Sereal::Encoder->new({freeze_callbacks=>1}); package Foo { sub FREEZE { $enc->encode({%{$_[0]}}) } sub THAW { bless(Sereal::Decoder->new->decode($_[2]), $_[0]) } } my $a = $enc->encode(bless({a=>1},"Foo")); my $b = Sereal::Decoder->new->decode($a); (Never mind that this is a somewhat silly thing to do in FREEZE because it's allowed to return a data structure that will be implicitly Serealised. This is just an example.) With the encoder thus shared, the encoded data is corrupted, and decoding signals an error. If the two ->encode calls use separate encoder objects, by duplicating the "my $enc" line just above "my $a", it all works. The underlying bug is that the srl_encoder_t structure doesn't distinguish between data items that are logically a permanent part of the encoder object (the configuration) and those that are ephemeral, being part of a particular encoding operation (buffered output, current recursion depth, seen-object table). The clean solution is for the encoding object to contain only the configuration, and to instantiate the per-operation variables once per operation, in a separate structure allocated on the C stack. I appreciate that you're trying to reuse the buffer for performance. If you insist on that kind of reuse, you need to detect reentrancy yourself and allocate separate variables specifically in the reentrant case. If you do this, there's still no real reason to tie the variables to the encoder structure: you can allocate the base variables in a totally static way (or per Perl thread), and detect reentrancy regardless of which encoding objects are being used. I believe Sereal::Decoder has a similar reentrancy bug, but it doesn't show up so readily. Whereas the encoder bug showed up with pretty much the first thing I tried (after forming the suspicion that there might be a fault of this nature), I don't have an example showing the decoder misbehaving. -zefram

Thu Mar 06 07:36:16 2014 smueller [...] cpan.org - Correspondence added

Subject:	Re: [rt.cpan.org #93560] recursive buffer snafu
Date:	Thu, 06 Mar 2014 13:35:55 +0100
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	Steffen Mueller <smueller [...] cpan.org>

Hi! Before I get into too much detail, let me say this: Thank you _very_ much for your review and debugging. That's incredibly valuable! I'm quite embarrassed that you've now hit the second bug that's my fault. On 03/06/2014 12:37 PM, Zefram via RT wrote: Show quoted text

> Sereal::Encoder->encode isn't reentrant, for calls on a single encoder > object. This can be seen by recursively encoding using the same encoder > object from a FREEZE callback:

[...] Show quoted text

> (Never mind that this is a somewhat silly thing to do in FREEZE > because it's allowed to return a data structure that will be implicitly > Serealised. This is just an example.) With the encoder thus shared, > the encoded data is corrupted, and decoding signals an error. If the > two ->encode calls use separate encoder objects, by duplicating the > "my $enc" line just above "my $a", it all works.

Clearly a bug indeed. I'll add a test along the lines of your example to the repo now. Show quoted text

> The underlying bug is that the srl_encoder_t structure doesn't distinguish > between data items that are logically a permanent part of the encoder > object (the configuration) and those that are ephemeral, being part of a > particular encoding operation (buffered output, current recursion depth, > seen-object table). The clean solution is for the encoding object to > contain only the configuration, and to instantiate the per-operation > variables once per operation, in a separate structure allocated on the > C stack. > > I appreciate that you're trying to reuse the buffer for performance. > If you insist on that kind of reuse, you need to detect reentrancy > yourself and allocate separate variables specifically in the reentrant > case. If you do this, there's still no real reason to tie the variables > to the encoder structure: you can allocate the base variables in a totally > static way (or per Perl thread), and detect reentrancy regardless of > which encoding objects are being used.

The way I see this is: - There's functions to allocate a completely new encoder. - There's functions to clone an encoder without its ephemeral state. - These were explicitly intended to handle such cases, including, for example, avoiding potential issues with ithreads. - There's a "dirty" flag that should indicate that an encoder must be cloned on the fly to ensure reentrancy. - I either got it all wrong from the start, or (which is just a tiny bit more likely), I got it wrong somewhere in the process of adding the Sereal v2 features like FREEZE. For sure, I'll try to look into this. Ultimately, I envisioned an approach such as: Try to aim for maximum performance in the common case (one serializer state reused sequentially) and degrade safely in the uncommon case (dirty serializer entered recursively or concurrently causing a new allocation and copy of the configuration). Sorry for getting that wrong. Show quoted text

> I believe Sereal::Decoder has a similar reentrancy bug, but it doesn't > show up so readily. Whereas the encoder bug showed up with pretty much > the first thing I tried (after forming the suspicion that there might > be a fault of this nature), I don't have an example showing the decoder > misbehaving.

I wouldn't be surprised either. However, let's keep the discussion of a potential decoder issue somewhat separate. Reason for me to say that is that most of the encoder is my fault, most of the decoder is Yves' fault. ;P Best regards, Steffen

Thu Mar 06 07:36:16 2014 The RT System itself - Status changed from 'new' to 'open'

Thu Mar 06 07:52:16 2014 smueller [...] cpan.org - Correspondence added

Subject:	Re: [rt.cpan.org #93560] recursive buffer snafu
Date:	Thu, 06 Mar 2014 13:51:57 +0100
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	Steffen Mueller <smueller [...] cpan.org>

On 03/06/2014 12:37 PM, Zefram via RT wrote: Show quoted text

> Sereal::Encoder->encode isn't reentrant, for calls on a single encoder > object. This can be seen by recursively encoding using the same encoder > object from a FREEZE callback:

I'm just about out of time, but I did some very basic debugging. I used your, Zefram's, example as a test (pushed to repo). The logic that is responsible for detecting that we're about to reuse a dirty encoder is mostly in the srl_prepare_encoder() function which should be called to boot any given encoding run. If I add a warn() to the branch in there that checks whether the encoder is already in use, and a different warn for the branch that it isn't, then I get the right output: No need to clone. at t/900_reentrancy.t line 17. Cloning! at t/900_reentrancy.t line 13. That means that I most certainly failed to get the cloning right: IOW I'm either cloning too much, cloning too little, or not resetting the parts right that I am not cloning. Chances are that this is going to be an easy fix once discovered. Best regards, Steffen

Thu Mar 06 12:11:48 2014 smueller [...] cpan.org - Correspondence added

On Thu Mar 06 07:52:16 2014, SMUELLER wrote: Show quoted text

> On 03/06/2014 12:37 PM, Zefram via RT wrote:

> > Sereal::Encoder->encode isn't reentrant, for calls on a single encoder > > object. This can be seen by recursively encoding using the same encoder > > object from a FREEZE callback:

> > I'm just about out of time, but I did some very basic debugging.

And Yves and I found a solution just now. It's in commit XXX in the repository. I'll try to cut a release later today. Commit message attached for reference. Thanks again for reporting this bug! Best regards, Steffen commit 759ec54a959e0cc247a4bc6204ef4fd467c47724 Author: Steffen Mueller <smueller@cpan.org> Date: Thu Mar 6 18:05:34 2014 +0100 Perl: Fix reentrancy bug in the encoder This pertains to RT ticket #93560 as reported by Zefram. If in a FREEZE hook, the same encoder object is reused, up to this commit, we'd emit corrupted Sereal for the INNER encode call. There is logic to detect reuse of a dirty encoder and to lazily clone the dirty encoder without the encode state. That logic is actually mostly working (see minor changes in the srl_build_encoder_struct_alike function). The bug was that this newly allocated, virgin encoder structure would be used for the inner encode run alright, but the XS wrapper code that extracts the output string was oblivious to the fact that its encoder struct was not the one that was used for the encode. The solution is to always return a pointer to the encoder struct that was finally used for the encode. Maybe there's more elegant solutions, but this one is simple and efficient.

Fri Mar 07 09:57:42 2014 smueller [...] cpan.org - Correspondence added

Just a release is pending. Will do as I churn through the other outstanding discussions in related RT tickets.

Fri Mar 07 09:57:45 2014 smueller [...] cpan.org - Status changed from 'open' to 'resolved'

Bug #93560 for Sereal-Encoder: recursive buffer snafu

Preferred bug tracker