Bug #93484 for Sereal-Encoder: arity check fault breaks static method resolution

Mon Mar 03 06:13:37 2014 zefram [...] fysh.org - Ticket created

Subject:	arity check fault breaks static method resolution
Date:	Mon, 3 Mar 2014 11:13:23 +0000
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	Zefram <zefram [...] fysh.org>

In attempting to eke out a bit of extra performance by statically resolving a critical method call, I found this breakage: $ perl -MSereal::Encoder -lwe '$s=Sereal::Encoder->new; print length($s->encode({})); print length($s->can("encode")->($s, {}));' 9 Found type 13 CODE(0x1e9c060), but it is not representable by the Sereal encoding format at -e line 1. This call with the separately-resolved method ought to be entirely equivalent to calling the method normally. It's very strange that it's not. Turns out that Sereal::Encoder::encode screws up an arity check and can look beyond its arguments on the stack. Whether the bug shows up therefore depends on accidents of the surrounding expression that determine what the invalid stack slot actually contains. Attached patch fixes. -zefram

Message body is not shown because sender requested not to inline it.

Mon Mar 03 06:37:38 2014 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93484] arity check fault breaks static method resolution
Date:	Mon, 3 Mar 2014 12:36:03 +0100
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On 3 March 2014 12:13, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote: Show quoted text

> Mon Mar 03 06:13:37 2014: Request 93484 was acted upon. > Transaction: Ticket created by zefram@fysh.org > Queue: Sereal-Encoder > Subject: arity check fault breaks static method resolution > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: zefram@fysh.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93484 > > > > In attempting to eke out a bit of extra performance by statically > resolving a critical method call, I found this breakage: > > $ perl -MSereal::Encoder -lwe '$s=Sereal::Encoder->new; print length($s->encode({})); print length($s->can("encode")->($s, {}));' > 9 > Found type 13 CODE(0x1e9c060), but it is not representable by the Sereal encoding format at -e line 1. > > This call with the separately-resolved method ought to be entirely > equivalent to calling the method normally. It's very strange that > it's not. Turns out that Sereal::Encoder::encode screws up an arity > check and can look beyond its arguments on the stack. Whether the bug > shows up therefore depends on accidents of the surrounding expression > that determine what the invalid stack slot actually contains. Attached > patch fixes.

Thanks, patch applied as: commit 568df18be6acbbcaaa049c38198d53be2986ceca Author: Zefram <zefram@fysh.org> Date: Mon Mar 3 12:33:02 2014 +0100 Fix rt.cpan.org #93484 - fencepost error in Encoder.xs In attempting to eke out a bit of extra performance by statically resolving a critical method call, I found this breakage: $ perl -MSereal::Encoder -lwe '$s=Sereal::Encoder->new; print length($s->encode({})); print length($s->can("encode")->($s, {}));' 9 Found type 13 CODE(0x1e9c060), but it is not representable by the Sereal encoding format at -e line 1. This call with the separately-resolved method ought to be entirely equivalent to calling the method normally. It's very strange that it's not. Turns out that Sereal::Encoder::encode screws up an arity check and can look beyond its arguments on the stack. Whether the bug shows up therefore depends on accidents of the surrounding expression that determine what the invalid stack slot actually contains. -- perl -Mre=debug -e "/just|another|perl|hacker/"

Mon Mar 03 06:37:39 2014 The RT System itself - Status changed from 'new' to 'open'

Mon Mar 03 08:10:20 2014 zefram [...] fysh.org - Correspondence added

Subject:	Re: [rt.cpan.org #93484] arity check fault breaks static method resolution
Date:	Mon, 3 Mar 2014 13:10:09 +0000
To:	demerphq via RT <bug-Sereal-Encoder [...] rt.cpan.org>
From:	Zefram <zefram [...] fysh.org>

demerphq via RT wrote: Show quoted text

>Thanks, patch applied as:

I suggest that you also add a test. FYI, the formulation that I've ended up using to minimise overhead is use Memoize::Lift qw(lift); use Scalar::Construct qw(constant); use Sereal::Encoder (); use Lexical::Var '$sereal_encoder' => constant(Sereal::Encoder->new); # $ser = $sereal_encoder->encode($data) $ser = lift($sereal_encoder->can("encode"))->($sereal_encoder, $data); and similarly for decoding. -zefram

Wed Mar 05 12:37:42 2014 smueller [...] cpan.org - Correspondence added

Version 2.04 of encoder and decoder were just uploaded to PAUSE and they have your fix. Thank you very much for diagnosing the issue and reporting it. Sorry for getting it wrong in the first place (that was most certainly me). Best regards, Steffen

Wed Mar 05 12:37:43 2014 smueller [...] cpan.org - Status changed from 'open' to 'resolved'

Wed Mar 05 14:41:07 2014 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93484] arity check fault breaks static method resolution
Date:	Wed, 5 Mar 2014 20:40:58 +0100
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On 3 March 2014 14:10, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote: Show quoted text

> Queue: Sereal-Encoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93484 > > > demerphq via RT wrote:

>>Thanks, patch applied as:

> > I suggest that you also add a test. > > FYI, the formulation that I've ended up using to minimise overhead is > > use Memoize::Lift qw(lift); > use Scalar::Construct qw(constant); > use Sereal::Encoder (); > use Lexical::Var '$sereal_encoder' => constant(Sereal::Encoder->new); > > # $ser = $sereal_encoder->encode($data) > $ser = lift($sereal_encoder->can("encode"))->($sereal_encoder, $data); > > and similarly for decoding.

How much difference does that make? Do you have timing data available? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

Wed Mar 05 15:33:42 2014 zefram [...] fysh.org - Correspondence added

Subject:	Re: [rt.cpan.org #93484] arity check fault breaks static method resolution
Date:	Wed, 5 Mar 2014 20:33:29 +0000
To:	demerphq via RT <bug-Sereal-Encoder [...] rt.cpan.org>
From:	Zefram <zefram [...] fysh.org>

demerphq via RT wrote: Show quoted text

>How much difference does that make?

Fractional. It was about 1% in my timing test; for comparison, Sereal was faster than Storable by a factor of around 1.5 to 2, depending on the test data. The static method resolution doesn't reduce the number of ops executed, but means that one op is of a cheaper type, a const op instead of a method_named op. As the Sereal::{En,De}coder modules don't use polymorphism, either as an API feature or as an implementation strategy, you're really not gaining anything by using a method call interface. It makes greater logical sense, and is fractionally more efficient, to view the {en,de}coder operation as a static subroutine that takes an {en,de}coder object ref as its first argument. That's essentially what I'm doing with my ->can formulation; you could make it easier by exposing and documenting the existing Sereal::Encoder::encode and Sereal::Decoder::decode subs as being callable in that way (possibly under other names). I've been wondering what gain could be made by turning Sereal's {en,de}code subs into custom ops. Removing the sub call overhead is potentially worthwhile in cases like this; it'll win more than avoiding the method resolution overhead. Static resolution of the subroutine is a prerequisite for the transformation into custom ops. It would also be possible to dispense with a pushmark op if the static subs have appropriate prototypes, rather than imitating method calls. I'm intending to try this out soonish. -zefram

Thu Mar 06 02:07:54 2014 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93484] arity check fault breaks static method resolution
Date:	Thu, 6 Mar 2014 08:07:45 +0100
To:	bug-Sereal-Encoder [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On 5 March 2014 21:33, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote: Show quoted text

> Queue: Sereal-Encoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93484 > > > demerphq via RT wrote:

>>How much difference does that make?

> > Fractional. It was about 1% in my timing test; for comparison, Sereal > was faster than Storable by a factor of around 1.5 to 2, depending on > the test data. The static method resolution doesn't reduce the number > of ops executed, but means that one op is of a cheaper type, a const op > instead of a method_named op. > > As the Sereal::{En,De}coder modules don't use polymorphism, either as an > API feature or as an implementation strategy, you're really not gaining > anything by using a method call interface. It makes greater logical > sense, and is fractionally more efficient, to view the {en,de}coder > operation as a static subroutine that takes an {en,de}coder object ref > as its first argument. That's essentially what I'm doing with my ->can > formulation; you could make it easier by exposing and documenting the > existing Sereal::Encoder::encode and Sereal::Decoder::decode subs as > being callable in that way (possibly under other names). > > I've been wondering what gain could be made by turning Sereal's > {en,de}code subs into custom ops. Removing the sub call overhead > is potentially worthwhile in cases like this; it'll win more than > avoiding the method resolution overhead. Static resolution of the > subroutine is a prerequisite for the transformation into custom ops. > It would also be possible to dispense with a pushmark op if the static > subs have appropriate prototypes, rather than imitating method calls. > I'm intending to try this out soonish.

Thanks a lot, this is very interesting. Please keep us posted. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

Fri Mar 07 01:12:07 2014 smueller [...] cpan.org - Correspondence added

On Thu Mar 06 02:07:54 2014, demerphq@gmail.com wrote: Show quoted text

> On 5 March 2014 21:33, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote:

> > Queue: Sereal-Encoder > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93484 > > > > > demerphq via RT wrote:

> >>How much difference does that make?

> > > > Fractional. It was about 1% in my timing test; for comparison, Sereal > > was faster than Storable by a factor of around 1.5 to 2, depending on > > the test data. The static method resolution doesn't reduce the number > > of ops executed, but means that one op is of a cheaper type, a const op > > instead of a method_named op. > > > > As the Sereal::{En,De}coder modules don't use polymorphism, either as an > > API feature or as an implementation strategy, you're really not gaining > > anything by using a method call interface. It makes greater logical > > sense, and is fractionally more efficient, to view the {en,de}coder > > operation as a static subroutine that takes an {en,de}coder object ref > > as its first argument. That's essentially what I'm doing with my ->can > > formulation; you could make it easier by exposing and documenting the > > existing Sereal::Encoder::encode and Sereal::Decoder::decode subs as > > being callable in that way (possibly under other names). > > > > I've been wondering what gain could be made by turning Sereal's > > {en,de}code subs into custom ops. Removing the sub call overhead > > is potentially worthwhile in cases like this; it'll win more than > > avoiding the method resolution overhead. Static resolution of the > > subroutine is a prerequisite for the transformation into custom ops. > > It would also be possible to dispense with a pushmark op if the static > > subs have appropriate prototypes, rather than imitating method calls. > > I'm intending to try this out soonish.

> > Thanks a lot, this is very interesting. Please keep us posted. > > Yves > >

Just some data: Rate function method function_state function 478548+-35/s -- -39.2% -46.1% method 786838+-94/s 64.4% -- -11.4% function_state 888076+-61/s 85.6% 12.9% -- #!perl use Sereal qw(:all); my $e = Sereal::Encoder->new; my $in = {}; my $out = $e->encode($in); use Benchmark::Dumb qw(:all); cmpthese( 10000.001, { method => sub {$out = $e->encode($in)}, function => sub {$out = encode_sereal($in)}, function_state => sub {$out = Sereal::Encoder::encode($e, $in)}, }, );

Bug #93484 for Sereal-Encoder: arity check fault breaks static method resolution

Preferred bug tracker