Skip Menu |

This queue is for tickets about the DBI CPAN distribution.

Report information
The Basics
Id: 83378
Status: open
Priority: 0/
Queue: DBI

People
Owner: Nobody in particular
Requestors: ribasushi [...] leporine.io
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: The slice-to-hash-with-rename feature is incomplete
Changes in DBIC 0.08240 finally allow me to take advantage of the new feature that laned in DBI 1.620 ( https://metacpan.org/source/TIMB/DBI-1.623/DBI.pm#L6437 ). However I realized that the feature is incomplete - it is only recognized by the eager fetchall_arrayref. Neither fetchrow_arrayref nor fetchrow_array recognize a $slice argument. Therefore I can not use it in DBIC where it is needed the most - in the case of cursor iteration. Is this a "brainfart" type omission, or are there underlying design considerations which make column renaming infeasible in the case of row-based iterators?
A small clear test case would be appreciated.
On Sat Feb 16 13:40:13 2013, RIBASUSHI wrote: Show quoted text
> Changes in DBIC 0.08240 finally allow me to take advantage of the new > feature that laned in DBI 1.620 ( > https://metacpan.org/source/TIMB/DBI-1.623/DBI.pm#L6437 ). However I > realized that the feature is incomplete - it is only recognized by the > eager fetchall_arrayref. Neither fetchrow_arrayref nor fetchrow_array > recognize a $slice argument. Therefore I can not use it in DBIC where it > is needed the most - in the case of cursor iteration. > > Is this a "brainfart" type omission, or are there underlying design > considerations which make column renaming infeasible in the case of > row-based iterators?
It would be very expensive for fetchrow_* to consider such logic. I presume that you're calling it just once, eg for find(), and not repeatedly on the same handle. Perhaps what's needed here is for some kind of bind_columns-like call to setup binding to your own hash. Then you'd just call $sth->fetch;
From the bind_columns docs: : For compatibility with old scripts, the first parameter will be : ignored if it is C<undef> or a hash reference. : : Here's a more fancy example that binds columns to the values I<inside> : a hash (thanks to H.Merijn Brand): : : $sth->execute; : my %row; : $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } )); : while ($sth->fetch) { : print "$row{region}: $row{sales}\n"; : } Perhaps we can make use of that old attribute hash *and* simplify the code in that "fancy example". Suggestions would be welcome. Patches would be delightful :)
Subject: Re: [rt.cpan.org #83378] The slice-to-hash-with-rename feature is incomplete
Date: Sun, 17 Feb 2013 19:24:00 +1100
To: Tim_Bunce via RT <bug-DBI [...] rt.cpan.org>
From: Peter Rabbitson <ribasushi [...] cpan.org>
On Sat, Feb 16, 2013 at 05:04:35PM -0500, Tim_Bunce via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=83378 > > > On Sat Feb 16 13:40:13 2013, RIBASUSHI wrote:
> > Changes in DBIC 0.08240 finally allow me to take advantage of the new > > feature that laned in DBI 1.620 ( > > https://metacpan.org/source/TIMB/DBI-1.623/DBI.pm#L6437 ). However I > > realized that the feature is incomplete - it is only recognized by the > > eager fetchall_arrayref. Neither fetchrow_arrayref nor fetchrow_array > > recognize a $slice argument. Therefore I can not use it in DBIC where it > > is needed the most - in the case of cursor iteration. > > > > Is this a "brainfart" type omission, or are there underlying design > > considerations which make column renaming infeasible in the case of > > row-based iterators?
> > It would be very expensive for fetchrow_* to consider such logic. I presume that you're calling it > just once, eg for find(), and not repeatedly on the same handle.
Um... no... many many many times on the same handle. That is: my $rs_returning_1mil_rows = $schema->resultset('Foo')->search(...); while (my $r = $rs_returning_1mil_rows->next) { ... do stuff } Yes of course using ->all is more efficient in terms of work done by the CPU, but it is not always possible to fit everything in memory. The specific code in DBIC that could be elided is: https://github.com/dbsrgits/dbix-class/blob/topic/constructor_rewrite/lib/DBIx/Class/ResultSet.pm#L1377 where $rows is a bunch of arrayrefs (obtained either via DBI's fetchall_X OR via fetchrow_X) and $infmap is an arrayref of "renames". Currently I can modify this code only for the case of $rs->all, which will make it very very hairy (the contents of $rows will differe depending on invocation style). Show quoted text
> Perhaps what's needed here is for some kind of bind_columns-like call to setup binding to your > own hash. Then you'd just call $sth->fetch;
Right, except I do not want to do that (the binding), because I need to guarantee that each hash is independent. And while I can re-bind a fresh hash created in pure-perl - this will not gain me any performance at all. The point (as I understood it) of the renaming feature was to allow for *multiple independent* hashes to be assembled by XS code with the data we expect. Show quoted text
> From the bind_columns docs: > > : For compatibility with old scripts, the first parameter will be > : ignored if it is C<undef> or a hash reference. > : > : Here's a more fancy example that binds columns to the values I<inside> > : a hash (thanks to H.Merijn Brand): > : > : $sth->execute; > : my %row; > : $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } )); > : while ($sth->fetch) { > : print "$row{region}: $row{sales}\n"; > : } > > Perhaps we can make use of that old attribute hash *and* simplify the code in that "fancy > example". > > Suggestions would be welcome. Patches would be delightful :)
I would love to write some tests, but I have to admit I am not sure what are you suggesting here as a way forward. See the requirement for independency of the individual hashes above. Let's keep talking this over, we can even meet on IRC at some point - just name a time.
Subject: Re: [rt.cpan.org #83378] The slice-to-hash-with-rename feature is incomplete
Date: Sun, 17 Feb 2013 10:05:24 +0100
To: bug-DBI [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sat, 16 Feb 2013 17:23:39 -0500, "Tim_Bunce via RT" <bug-DBI@rt.cpan.org> wrote: Show quoted text
> Queue: DBI > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=83378 > > > From the bind_columns docs: > > : For compatibility with old scripts, the first parameter will be > : ignored if it is C<undef> or a hash reference. > : > : Here's a more fancy example that binds columns to the values I<inside> > : a hash (thanks to H.Merijn Brand): > : > : $sth->execute; > : my %row; > : $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } )); > : while ($sth->fetch) { > : print "$row{region}: $row{sales}\n"; > : } > > Perhaps we can make use of that old attribute hash *and* simplify the > code in that "fancy example".
I don't see how that "fancy example" can be simplified without losing the clarity of its purpose. I still use this code (though styled differently) on a daily basis in a lot of production code, so if that could be made simpler or even faster, I'd be very interested. Other than that I might be missing the point completely, and would like to hear some more verbose "wish" Show quoted text
> Suggestions would be welcome. Patches would be delightful :)
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
CC: TIMB [...] cpan.org
Subject: Re: [rt.cpan.org #83378] The slice-to-hash-with-rename feature is incomplete
Date: Sun, 17 Feb 2013 21:10:34 +1100
To: "h.m.brand [...] xs4all.nl via RT" <bug-DBI [...] rt.cpan.org>
From: Peter Rabbitson <ribasushi [...] cpan.org>
On Sun, Feb 17, 2013 at 04:05:45AM -0500, h.m.brand@xs4all.nl via RT wrote: Show quoted text
> Other than that I might be missing the point completely, and would like > to hear some more verbose "wish"
Right, good point. I should start from the /achieve. As stated in an earlier reply - I want to push the code in this loop into the realm of DBI: https://github.com/dbsrgits/dbix-class/blob/topic/constructor_rewrite/lib/DBIx/Class/ResultSet.pm#L1377 where $rows is a bunch of arrayrefs (obtained either via DBI's fetchall_X OR via fetchrow_X) and $infmap is an arrayref of "renames". Currently given what DBI offers me I can delegate the work only for ->all()-style cases, not for iterating ->next()-style cases. I am soliciting brainstorming towards an API addition which will give me a solution for that 2nd case. Also note that it is ver possible this entire ticket is moot - it is possible there are *no* speedups to be had by pushing this functionality into DBI (i.e. there is no way to XS-ify the arrayref->hashref transition in the case of an iterating access). If this is the case - just state so and we move on ;)
CC: "h.m.brand [...] xs4all.nl via RT" <bug-DBI [...] rt.cpan.org>, TIMB [...] cpan.org
Subject: Re: [rt.cpan.org #83378] The slice-to-hash-with-rename feature is incomplete
Date: Sun, 17 Feb 2013 12:51:39 +0000
To: Peter Rabbitson <ribasushi [...] cpan.org>
From: Tim Bunce <Tim.Bunce [...] pobox.com>
At the core of DBI fetching is the idea of an array of field buffers (grep for _fbav). Drivers assign values to SVs via that array. That's simple and fast because it reuses the same SVs each time. Those SVs have been set to the right type (at least after the first row) and had their buffer grown big enough to fit typical values (at least after a few rows). What bind_col(umns) does is simply change the pointers in the fbav array to point at different scalars. The driver still uses fbav and we still get all the benefits of reuse, but now the application gets the benefit of having the values written into whatever SVs it wants, in whatever structure it wants. This already works (and has for years): my $sth = $dbh->prepare("select name, size, mode from $table"); my $struct; # bind columns to values in whatever nested structure we want: $sth->bind_columns(undef, \($struct->{name}, $struct->{attr}{size}, $struct->{attr}{ary}[2]))); $sth->execute; my @structs; push @structs, dclone($struct) while $csr_a->fetch; $VAR1 = [ { 'name' => '.', 'attr' => { 'ary' => [ undef, undef, 16872 ], 'size' => 102 } }, { 'name' => '..', 'attr' => { 'ary' => [ undef, undef, 16877 ], 'size' => 2414 } }, { 'name' => '000_just_testing', 'attr' => { 'ary' => [ undef, undef, 16872 ], 'size' => 68 } } ]; What's needed beyond that? Can't you just ignore what fetch*() returns and 'clone' the structure you bound the field buffers into? Show quoted text
> Also note that it is ver possible this entire ticket is moot - it is > possible there are *no* speedups to be had by pushing this functionality > into DBI (i.e. there is no way to XS-ify the arrayref->hashref > transition in the case of an iterating access). If this is the case - > just state so and we move on ;)
The way I see it, there's no need for an arrayref->hashref transition. Just get the DBI to update hash elements for you then clone the hash. Tim.
On Sun Feb 17 07:51:57 2013, Tim.Bunce@pobox.com wrote: Show quoted text
> The way I see it, there's no need for an arrayref->hashref transition. > Just get the DBI to update hash elements for you then clone the hash.
As far as I know cloning (especially dclone()) is extremely slow. I will have to benchmark to substantiate this though. Will report back once I have results.
Subject: Re: [rt.cpan.org #83378] The slice-to-hash-with-rename feature is incomplete
Date: Sun, 17 Feb 2013 17:00:38 +0000
To: Peter Rabbitson via RT <bug-DBI [...] rt.cpan.org>
From: Tim Bunce <Tim.Bunce [...] pobox.com>
Show quoted text
> As far as I know cloning (especially dclone()) is extremely slow.
dclone is too generic. For example, it takes time to construct a separate pointer table in order to "do the right thing" with reference loops etc. Their ought to be a more limited, and therefore faster, clone()-like function on CPAN. It would be interesting to see benchmarks of "simple cloners cloning small simple structures", including JSON::XS and Sereal. Both of those might beat Storable::dclone. If all else fails, it wouldn't be too hard to implement such a simple cloner in XS that would be faster than any perl code. Tim.
Show quoted text
> JSON::XS and Sereal. Both of those might beat Storable::dclone.
Well, Sereal should but JSON::XS probably wouldn't. Still, the basic point stands. There are also worth a look and comparing: https://metacpan.org/module/Clone https://metacpan.org/module/Clone::Fast https://metacpan.org/module/Data::Clone
On Sun Feb 17 13:21:46 2013, TIMB wrote: Show quoted text
> > JSON::XS and Sereal. Both of those might beat Storable::dclone.
> > Well, Sereal should but JSON::XS probably wouldn't. Still, the basic > point stands.
You will be very very surprised then. Attached is the benchmark script (can take a long time to run - the Dumbbench tolerances are set very low). I will do a DBI-based benchmark later. Although given that even a plain hash construction from perl space already beats all the cloners, an XS-space construction with rebinding behind the scenes on every itireation sounds like an uncontested winner. Here are the results on 5.16.2 compiled with ithreads ----------------------------------------------------- Rate dclone sereal data_clone clone json_xs newhash dclone 21.3717+-0.006/s -- -38.9% -56.2% -56.5% -69.6% -80.5% sereal 34.953+-0.01/s 63.5% -- -28.4% -28.8% -50.3% -68.2% data_clone 48.808+-0.014/s 128.4% 39.6% -- -0.6% -30.7% -55.5% clone 49.083+-0.015/s 129.7% 40.4% 0.6% -- -30.3% -55.3% json_xs 70.397+-0.021/s 229.4% 101.4% 44.2% 43.4% -- -35.9% newhash 109.762+-0.032/s 413.6% 214.0% 124.9% 123.6% 55.9% -- 5.16.2 without ithreads ----------------------------------------------------- Rate dclone sereal data_clone clone json_xs newhash dclone 24.2437+-0.0047/s -- -29.6% -55.5% -55.5% -69.8% -80.6% sereal 34.44+-0.01/s 42.1% -- -36.8% -36.8% -57.1% -72.5% data_clone 54.455+-0.016/s 124.6% 58.1% -- -0.1% -32.1% -56.4% clone 54.527+-0.016/s 124.9% 58.3% 0.1% -- -32.0% -56.4% json_xs 80.223+-0.023/s 230.9% 132.9% 47.3% 47.1% -- -35.8% newhash 125.03+-0.036/s 415.7% 263.0% 129.6% 129.3% 55.9% -- 5.12.4 with ithreads ----------------------------------------------------- Rate dclone sereal data_clone clone json_xs newhash dclone 5.98292+-0.00046/s -- -80.8% -84.3% -84.4% -91.1% -95.1% sereal 31.1264+-0.0093/s 420.3% -- -18.5% -18.6% -53.5% -74.4% data_clone 38.171+-0.011/s 538.0% 22.6% -- -0.2% -42.9% -68.6% clone 38.261+-0.011/s 539.5% 22.9% 0.2% -- -42.8% -68.5% json_xs 66.88+-0.02/s 1017.8% 114.9% 75.2% 74.8% -- -45.0% newhash 121.522+-0.036/s 1931.2% 290.4% 218.4% 217.6% 81.7% -- 5.8.8 without ithreads ----------------------------------------------------- Rate dclone sereal clone data_clone json_xs newhash dclone 3.85165+-0.00023/s -- -89.6% -94.9% -94.9% -95.6% -96.9% sereal 36.927+-0.011/s 858.7% -- -51.1% -51.3% -57.3% -69.9% clone 75.543+-0.022/s 1861.3% 104.6% -- -0.3% -12.7% -38.4% data_clone 75.768+-0.023/s 1867.2% 105.2% 0.3% -- -12.5% -38.2% json_xs 86.573+-0.025/s 2147.7% 134.4% 14.6% 14.3% -- -29.4% newhash 122.608+-0.036/s 3083.2% 232.0% 62.3% 61.8% 41.6% -- All benches were executed after all modules in question were upgraded: rabbit@Ahasver:~$ cpanm Time::HiRes Data::Clone Storable YAML::XS JSON::XS Sereal Clone Data::Clone Data::Dumper Dumbbench Time::HiRes is up to date. (1.9725) Data::Clone is up to date. (0.003) Storable is up to date. (2.39) YAML::XS is up to date. (0.39) JSON::XS is up to date. (2.33) Sereal is up to date. (0.310) Clone is up to date. (0.34) Data::Dumper is up to date. (2.139) Dumbbench is up to date. (0.09) YAML::XS was commented out because it is just too damn slow. Clone::Fast was *not* tested because it does not compile on perls above 5.10 *AND* it is in fact slower than current Clone.pm
Subject: bench_hash_clone.pl
use warnings; use strict; use Benchmark::Dumb 'cmpthese'; use Storable 'dclone'; use YAML::XS qw(Dump Load); use JSON::XS qw(encode_json decode_json); use Sereal qw(encode_sereal decode_sereal); use Clone 'clone'; use Data::Clone 'data_clone'; use Data::Dumper; use Time::HiRes; # Hack to make Dumbbench use CPU time as opposed to walltime { no warnings 'redefine'; local *Time::HiRes::time = eval "sub { Time::HiRes::clock_gettime( @{[ Time::HiRes::CLOCK_PROCESS_CPUTIME_ID() ]} ) }" or die $@; } $Data::Dumper::Deparse = 1; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 1; $Data::Dumper::Terse = 1; print Dumper { 'make sure all benched functions are XSUBs' => { dclone => \&dclone, yaml_dump => \&Dump, yaml_load => \&Load, json_encode => \&encode_json, json_decode => \&decode_json, sereal_encode => \&encode_sereal, sereal_decode => \&decode_sereal, clone => \&clone, data_clone => \&data_clone, }}; my $orig = { a => 1, b => 2 }; my $new; my $iter = 5000; my @dummy = (1, 2); sub new_hash { return { a => $dummy[0], b => $dummy[1] }; } cmpthese('200.0003', { ## YAML just isn't a viable contender - 3.5 times slower than dclone() # yaml_xs => sub { # $new = Load(Dump($orig)) for (1..$iter); # }, newhash => sub { $new = new_hash() for (1..$iter); }, dclone => sub { $new = dclone($orig) for (1..$iter); }, json_xs => sub { $new = decode_json(encode_json($orig)) for (1..$iter); }, sereal => sub { $new = decode_sereal(encode_sereal($orig)) for (1..$iter); }, clone => sub { $new = clone($orig) for (1..$iter); }, data_clone => sub { $new = clone($orig) for (1..$iter); }, });
On Mon Feb 18 12:05:19 2013, RIBASUSHI wrote: Show quoted text
> On Sun Feb 17 13:21:46 2013, TIMB wrote:
> > > JSON::XS and Sereal. Both of those might beat Storable::dclone.
> > > > Well, Sereal should but JSON::XS probably wouldn't. Still, the basic > > point stands.
> > You will be very very surprised then. Attached is the benchmark script > (can take a long time to run - the Dumbbench tolerances are set very
low). Show quoted text
> > I will do a DBI-based benchmark later. Although given that even a
plain Show quoted text
> hash construction from perl space already beats all the cloners, an > XS-space construction with rebinding behind the scenes on every > itireation sounds like an uncontested winner.
Things like Sereal and Storable do a lot more work that you want here. A couple of comments on Sereal performance: a) ALWAYS use the OO interface if you care about performance. This makes a big difference, see b). b) The larger the structure, the better Sereal will look. It has relatively expensive setup compared to JSON::XS. c) Due to the nature of the data, it seems like you'll never care about repeated hash keys. You get an extra speed improvement by using "no_shared_hashkeys => 1". If repeated hash keys occur, it's still faster without sharing them, but the intermediate output string will be larger. But in the end, even for the OO interface, the best you can hope out of Sereal vs. JSON::XS for is the simplest data structure benchmarks here: https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs You can see that JSON::XS beats Sereal on an empty hash here but Sereal comes out in "small hash". Benchmark code lives in the Sereal repo under author_tools. A dedicated "build a copy of this data structure" approach to cloning must beat serialization/deserialization by a long shot AND is bound to be quite a bit simpler. You could start with either the Sereal or JSON::XS encoder implementations as a base for it, too. Just build a tree of Perl structures instead of generating output. Extra benefit: If you use Sereal::Decoders pass-in-the-output-SV style, you get very simple and clean exception handling. --Steffen
On Mon Feb 18 13:21:41 2013, SMUELLER wrote: Show quoted text
> On Mon Feb 18 12:05:19 2013, RIBASUSHI wrote:
> > On Sun Feb 17 13:21:46 2013, TIMB wrote:
> > > > JSON::XS and Sereal. Both of those might beat Storable::dclone.
> > > > > > Well, Sereal should but JSON::XS probably wouldn't. Still, the
basic Show quoted text
> > > point stands.
> > > > You will be very very surprised then. Attached is the benchmark
script Show quoted text
> > (can take a long time to run - the Dumbbench tolerances are set very
> low).
> > > > I will do a DBI-based benchmark later. Although given that even a
> plain
> > hash construction from perl space already beats all the cloners, an > > XS-space construction with rebinding behind the scenes on every > > itireation sounds like an uncontested winner.
> > Things like Sereal and Storable do a lot more work that you want here.
A Show quoted text
> couple of comments on Sereal performance: > > a) ALWAYS use the OO interface if you care about performance. This
makes Show quoted text
> a big difference, see b). > > b) The larger the structure, the better Sereal will look. It has > relatively expensive setup compared to JSON::XS. > > c) Due to the nature of the data, it seems like you'll never care
about Show quoted text
> repeated hash keys. You get an extra speed improvement by using > "no_shared_hashkeys => 1". If repeated hash keys occur, it's still > faster without sharing them, but the intermediate output string will
be Show quoted text
> larger. > > But in the end, even for the OO interface, the best you can hope out
of Show quoted text
> Sereal vs. JSON::XS for is the simplest data structure benchmarks
here: Show quoted text
> > https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs > > You can see that JSON::XS beats Sereal on an empty hash here but
Sereal Show quoted text
> comes out in "small hash". Benchmark code lives in the Sereal repo
under Show quoted text
> author_tools. > > A dedicated "build a copy of this data structure" approach to cloning > must beat serialization/deserialization by a long shot AND is bound to > be quite a bit simpler. You could start with either the Sereal or > JSON::XS encoder implementations as a base for it, too. Just build a > tree of Perl structures instead of generating output. Extra benefit:
If Show quoted text
> you use Sereal::Decoders pass-in-the-output-SV style, you get very > simple and clean exception handling. > > --Steffen
With the above changes (and just running dumbbench slightly less long), I get: Rate dclone clone data_clone sereal json_xs newhash dclone 52.95+-0.041/s -- -55.5% -55.7% -63.6% -71.1% -83.3% clone 119.05+-0.1/s 124.83+-0.26% -- -0.3% -18.2% -35.1% -62.6% data_clone 119.4+-0.11/s 125.49+-0.27% 0.29+-0.12% -- -18.0% -34.9% -62.4% sereal 145.54+-0.12/s 174.85+-0.32% 22.25+-0.15% 21.89+-0.15% -- -20.7% -54.2% json_xs 183.53+-0.14/s 246.61+-0.37% 54.16+-0.17% 53.71+-0.18% 26.11+-0.14% -- -42.3% newhash 317.96+-0.34/s 500.5+-0.79% 167.08+-0.36% 166.31+-0.37% 118.48+-0.3% 73.25+-0.22% -- That is very much in line of what I'd expect. If you use bigger data structures, Sereal will beat JSON::XS (not that it matters much) and if you use structures with common subtrees, it will beat the heck out of JSON::XS purely because it entirely avoids serializing things multiple times. Nonetheless, my comment about serialization vs. cloning holds. Maybe some day, $work will have a need for a very efficient clone function and Yves or I get the time to implement one.