On Mon Feb 18 13:21:41 2013, SMUELLER wrote:
Show quoted text> On Mon Feb 18 12:05:19 2013, RIBASUSHI wrote:
> > On Sun Feb 17 13:21:46 2013, TIMB wrote:
> > > > JSON::XS and Sereal. Both of those might beat Storable::dclone.
> > >
> > > Well, Sereal should but JSON::XS probably wouldn't. Still, the
basic
Show quoted text> > > point stands.
> >
> > You will be very very surprised then. Attached is the benchmark
script
Show quoted text> > (can take a long time to run - the Dumbbench tolerances are set very
> low).
> >
> > I will do a DBI-based benchmark later. Although given that even a
> plain
> > hash construction from perl space already beats all the cloners, an
> > XS-space construction with rebinding behind the scenes on every
> > itireation sounds like an uncontested winner.
>
> Things like Sereal and Storable do a lot more work that you want here.
A
Show quoted text> couple of comments on Sereal performance:
>
> a) ALWAYS use the OO interface if you care about performance. This
makes
Show quoted text> a big difference, see b).
>
> b) The larger the structure, the better Sereal will look. It has
> relatively expensive setup compared to JSON::XS.
>
> c) Due to the nature of the data, it seems like you'll never care
about
Show quoted text> repeated hash keys. You get an extra speed improvement by using
> "no_shared_hashkeys => 1". If repeated hash keys occur, it's still
> faster without sharing them, but the intermediate output string will
be
Show quoted text> larger.
>
> But in the end, even for the OO interface, the best you can hope out
of
Show quoted text> Sereal vs. JSON::XS for is the simplest data structure benchmarks
here:
Show quoted textSereal
Show quoted text> comes out in "small hash". Benchmark code lives in the Sereal repo
under
Show quoted text> author_tools.
>
> A dedicated "build a copy of this data structure" approach to cloning
> must beat serialization/deserialization by a long shot AND is bound to
> be quite a bit simpler. You could start with either the Sereal or
> JSON::XS encoder implementations as a base for it, too. Just build a
> tree of Perl structures instead of generating output. Extra benefit:
If
Show quoted text> you use Sereal::Decoders pass-in-the-output-SV style, you get very
> simple and clean exception handling.
>
> --Steffen
With the above changes (and just running dumbbench slightly less long),
I get:
Rate dclone clone data_clone
sereal json_xs newhash
dclone 52.95+-0.041/s -- -55.5% -55.7%
-63.6% -71.1% -83.3%
clone 119.05+-0.1/s 124.83+-0.26% -- -0.3%
-18.2% -35.1% -62.6%
data_clone 119.4+-0.11/s 125.49+-0.27% 0.29+-0.12% --
-18.0% -34.9% -62.4%
sereal 145.54+-0.12/s 174.85+-0.32% 22.25+-0.15% 21.89+-0.15%
-- -20.7% -54.2%
json_xs 183.53+-0.14/s 246.61+-0.37% 54.16+-0.17% 53.71+-0.18%
26.11+-0.14% -- -42.3%
newhash 317.96+-0.34/s 500.5+-0.79% 167.08+-0.36% 166.31+-0.37%
118.48+-0.3% 73.25+-0.22% --
That is very much in line of what I'd expect. If you use bigger data
structures, Sereal will beat JSON::XS (not that it matters much) and if
you use structures with common subtrees, it will beat the heck out of
JSON::XS purely because it entirely avoids serializing things multiple
times.
Nonetheless, my comment about serialization vs. cloning holds. Maybe
some day, $work will have a need for a very efficient clone function and
Yves or I get the time to implement one.