Subject: | [PATCH] Change the iterator to fetch on demand. |
The attached patch changes Class::DBI::Iterator to take an active
statement handle and fetch the rows on demand rather than working from a
pre-fetched list. This can drastically improve the performance of
iterators when the whole list is not iterated through. In particular
the common idiom of, search->first when search might return a lot of rows.
There's still some work and tweaks to be done, but I wanted to post it
here so it can be evaluated.
Notes:
* All the existing methods still function the same.
* The Iterator will still accept a data list.
* In order to support reset() and count() it is necessary for the
iterator to store the data being fetched. This wastes some memory.
* count() remains as inefficient as before. The statement handle must
fetch all the data to count it up.
* CDBI->sth_to_objects() has been altered to pass in the statement
handle to the iterator. This will break any existing CDBI::Iterator
subclasses as it uses a new interface.
* Because sth_to_objects() was tweaked this has a much wider performance
effect than just search(). It should effect any search-like
functionality including CDBI::AbstractSearch.
* Its worth considering if CDBI::Iterator should be split into
CDBI::Iterator::FromList and CDBI::Iterator::FromSth with CDBI::Iterator
becoming abstract.
* I had to turn off all cached statements or else DBI will bitch that a
cached handle is being reused. This is not ideal. There is an option
to DBI->prepare_cached() to make it do the right thing (ie. silently
prepare a new handle if the cached one is still active) but it requires
A) an Ima::DBI patch and B) Ima::DBI requiring DBI 1.40. I'll do it if
that's ok. It would, in general, make set_sql() behave better.
* The deprecated delete(@search) had to be deleted because of locking
issues in SQLite.
* Now that iterators contain active statement handles they may hold read
locks open. This could cause new locking issues in existing
applications. SQLite, for example, does not allow you to delete a row
which is read locked. Note that this is no worse than having an active
statement handle open and deleting the rows as you fetch them.
* As a further performance tweak the execution of the statement handle
can be delayed until the first fetch. This would also allow unused
iterators to be safely created and passed around without worrying about
holding locks open. Or even created opportunistically and then tossed
away if they're not needed. I haven't implemented this.
Subject: | iterator.patch |
Auto-merging (0, 27397) /local/Class-DBI to /vendor/Class-DBI (base /vendor/Class-DBI:27388).
U t/02-Film.t
U lib/Class/DBI.pm
U lib/Class/DBI/Iterator.pm
==== Patch <-> level 1
Source: 9c88509d-e914-0410-b01c-b9530614cbfe:/local/Class-DBI:27397
Target: 9c88509d-e914-0410-b01c-b9530614cbfe:/vendor/Class-DBI:27388
Log:
Efficient iterator patch take 1
=== t/02-Film.t
==================================================================
--- t/02-Film.t (revision 27388)
+++ t/02-Film.t (patch - level 1)
@@ -4,7 +4,7 @@
BEGIN {
eval "use DBD::SQLite";
- plan $@ ? (skip_all => 'needs DBD::SQLite for testing') : (tests => 91);
+ plan $@ ? (skip_all => 'needs DBD::SQLite for testing') : (tests => 88);
}
INIT {
@@ -158,17 +158,6 @@
ok(Film->retrieve('Ishtar')->delete,
"Ishtar doesn't deserve an entry any more");
ok(!Film->retrieve('Ishtar'), 'Ishtar no longer there');
- {
- my $deprecated = 0;
- local $SIG{__WARN__} = sub { $deprecated++ if $_[0] =~ /deprecated/ };
- ok(
- Film->delete(Director => 'Elaine May'),
- "In fact, delete all films by Elaine May"
- );
- is(Film->search(Director => 'Elaine May')->count,
- 0, "0 Films by Elaine May");
- is $deprecated, 1, "Got a deprecated warning";
- }
};
is $@, '', "No problems with deletes";
=== lib/Class/DBI.pm
==================================================================
--- lib/Class/DBI.pm (revision 27388)
+++ lib/Class/DBI.pm (patch - level 1)
@@ -7,7 +7,7 @@
package Class::DBI;
-use version; $VERSION = qv('3.0.16');
+use version; $VERSION = qv('3.0.16.1');
use strict;
use warnings;
@@ -730,7 +730,7 @@
sub delete {
my $self = shift;
- return $self->_search_delete(@_) if not ref $self;
+
$self->remove_from_object_index;
$self->call_trigger('before_delete');
@@ -748,16 +748,6 @@
return 1;
}
-sub _search_delete {
- my ($class, @args) = @_;
- $class->_carp(
- "Delete as class method is deprecated. Use search and delete_all instead."
- );
- my $it = $class->search_like(@args);
- while (my $obj = $it->next) { $obj->delete }
- return 1;
-}
-
# Return the placeholder to be used in UPDATE and INSERT queries.
# Overriding this is deprecated in favour of
# __PACKAGE__->find_column('entered')->placeholder('IF(1, CURDATE(), ?));
@@ -930,6 +920,7 @@
sub set_sql {
my ($class, $name, $sql, $db, @others) = @_;
$db ||= 'Main';
+ @others = 0;
$class->SUPER::set_sql($name, $sql, $db, @others);
$class->_generate_search_sql($name) if $sql =~ /select/i;
return 1;
@@ -1130,19 +1121,19 @@
sub sth_to_objects {
my ($class, $sth, $args) = @_;
$class->_croak("sth_to_objects needs a statement handle") unless $sth;
+
unless (UNIVERSAL::isa($sth => "DBI::st")) {
my $meth = "sql_$sth";
$sth = $class->$meth();
}
- my (%data, @rows);
+
eval {
$sth->execute(@$args) unless $sth->{Active};
- $sth->bind_columns(\(@data{ @{ $sth->{NAME_lc} } }));
- push @rows, {%data} while $sth->fetch;
};
return $class->_croak("$class can't $sth->{Statement}: $@", err => $@)
if $@;
- return $class->_ids_to_objects(\@rows);
+
+ return $class->_sth_to_iterator($sth);
}
*_sth_to_objects = \&sth_to_objects;
@@ -1153,9 +1144,36 @@
return $class;
}
+sub _sth_to_iterator {
+ my($class, $sth) = @_;
+
+ return unless defined wantarray;
+ return $class->_objects_from_sth($sth) if wantarray;
+
+ my $iterator = $class->_my_iterator->new($class);
+ $iterator->sth($sth);
+ return $iterator;
+}
+
+sub _data_from_sth {
+ my($class, $sth) = @_;
+
+ my( @rows, %data );
+ $sth->bind_columns(\(@data{ @{ $sth->{NAME_lc} } }));
+ push @rows, {%data} while $sth->fetch;
+
+ return \@rows;
+}
+
+sub _objects_from_sth {
+ my($class, $sth) = @_;
+
+ return map $class->construct($_), @{$class->_data_from_sth($sth)};
+}
+
sub _ids_to_objects {
my ($class, $data) = @_;
- return $#$data + 1 unless defined wantarray;
+ return unless defined wantarray;
return map $class->construct($_), @$data if wantarray;
return $class->_my_iterator->new($class => $data);
}
=== lib/Class/DBI/Iterator.pm
==================================================================
--- lib/Class/DBI/Iterator.pm (revision 27388)
+++ lib/Class/DBI/Iterator.pm (patch - level 1)
@@ -30,6 +30,17 @@
can fetch them one at a time, potentially saving a considerable amount
of processing time and memory.
+=head1 EFFICIENCY
+
+The iterator has two modes of operation. One is to work off of a list of data handed into new(). This can be very wasteful if the entire data list is not iterated through, for example: C<< Class->search(...)->first >>.
+
+The other mode is to pass in no data and instead give it an active, executed statement handle.
+
+ my $iterator = CDBI::Iterator->new($cdbi_class);
+ $iterator->sth($search_sth);
+
+In this mode, the iterator will only read from the statement handle on demand. This can reduce a lot of overhead for large queries where you are only iterating through a small amount. Note that in this mode the statement handle is kept active until all the data is fetched or the iterator is destroyed.
+
=head1 CAVEAT
Note that there is no provision for the data changing (or even being
@@ -50,10 +61,20 @@
_data => $data,
_mapper => [@mapper],
_place => 0,
+ _sth => undef,
},
ref $me || $me;
}
+sub sth {
+ my $self = shift;
+
+ if( @_ ) {
+ $self->{_sth} = shift;
+ }
+ return $self->{_sth};
+}
+
sub set_mapping_method {
my ($self, @mapper) = @_;
$self->{_mapper} = [@mapper];
@@ -61,9 +82,53 @@
}
sub class { shift->{_class} }
-sub data { @{ shift->{_data} } }
sub mapper { @{ shift->{_mapper} } }
+sub next_data {
+ my $self = shift;
+ my $place = $self->{_place}++;
+
+ if( !defined $self->{_data}[$place]
+ and (my $sth = $self->{_sth}) )
+ {
+ $self->_fill_in_data_from_sth($place);
+ return $self->{_data}[$place];
+ }
+
+ return $self->{_data}[$place];
+}
+
+sub _sth_next {
+ my $self = shift;
+
+ my $sth = $self->{_sth};
+ return unless $sth->{Active};
+
+ my $row = $sth->fetchrow_hashref;
+ return unless $row;
+
+ push @{$self->{_data}}, $row;
+ return $row;
+}
+
+sub _fill_in_data_from_sth {
+ my($self, $place) = @_;
+
+ while( $#{$self->{_data}} < $place ) {
+ last unless defined $self->_sth_next;
+ }
+}
+
+sub data {
+ my $self = shift;
+
+ if( my $sth = $self->{_sth} ) {
+ $self->_fill_in_data_from_sth("Inf");
+ }
+
+ return @{$self->{_data} || []};
+}
+
sub count {
my $self = shift;
$self->{_count} ||= scalar $self->data;
@@ -71,7 +136,7 @@
sub next {
my $self = shift;
- my $use = $self->{_data}->[ $self->{_place}++ ] or return;
+ my $use = $self->next_data or return;
my @obj = ($self->class->construct($use));
foreach my $meth ($self->mapper) {
@obj = map $_->$meth(), @obj;
@@ -107,7 +172,10 @@
while (my $obj = $self->next) {
$obj->delete;
}
+
$self->{_data} = [];
+ $self->{_sth} = undef;
+
1;
}
==== BEGIN SVK PATCH BLOCK ====
Version: svk v2.0.0 (darwin)
eJyNV92P41YVt1qxS0IREs9IXFqvmtAk48/YzuxEKdNZaVTtgmApQt2Ve23fOzbj+Ab7ZjLROOoE
OhVQ8cALQkJFvLTwxht/Aa+o/0wF4o1zrxNPkp2ZrWY0k/ie8zu/8339KH+6P9LL4VArVV0rf/Le
u4PBjzAP4we6Var9kkQJZ7lqlyk5I6lqlik7Ua0yw2MCpwWb5qH4wHF+Qrj4kISnhA+HOsC5FdyR
hFjDStQAc5YVqifhfZ4TouqlO/LKkSl+fVX3yoLAiYT1c3KWFAnLgIbhmJ4DIiCvgz6bkMzPGePy
yNKNkSG0tTJMWUF8AS8gbSFvqOCSVIiSnITAaQ5PuURaa0tB6wZBcDwJtkQlTXtlaBNRBw4VkLlm
SJMUPARje5rRfZSk4x6XhF6KZVRY8hRPJunc5+ScRyTlWOKbRqlZlBqRZ1qObZo60Wlka2FfpwZ2
QifEkWqIPDxRlC/8f3/7W68oy28ov1I+u7f8pvL5h3//wHVH1mbEKqYVMvZ0jA3XM6hmBrZO+zYO
LY861LRtg7ph7ad9Q8A8EbC9wxQXhQzpS1216kT1t8PWv0bae+cHx73JWJrcSplzU27NbcUtrVtp
2HdGvC/ioruBZmu64xCq2wEhEG/bjjwjcPtaSGigevoq5B//VfwOrvrKcgRRX/6xI/5Pl3/6rvK3
7y8/vfeJ8o/z5b+6Cll+qf5T+dnyP+bnyrK3/K92+dFj5aPPni//952ePmI8JnmBDpDWbJAznKKL
ZqOhFjzuDsk5CaectEYqdErRRtMsJUWBqsOLt0OenJFFs5ETPs0zpIYiFt2hD8c+Z37CSQ6dmLeE
/M4zYWQ8b1UqHYnYBgojf79Z462sRYQmGYnQDGfQsDme779okQW/gBAXPs3ZWJhvVYAJ3dISJpFa
Uzi41h/Pa2bdYUZmK2JtUFKvD2rgDQbr0/3motkspgHyI8xxTeQOPyWdFhrlbAYnD4QakgZldIMk
i/yQpdNxVrSetUbi+AKNLtbBf/L24yM/DRcIftpCbTIt4jXYhURboFkMJb7SoATmo7S6ov5MCl/z
3o3h3dRXIGM8qaMYwtDl+TTkLdVvd4BrHd6tkFQhXFwbTqJClMbKfmUWXdsVyrXhyixU/Z/LT1+9
v7x/ubwPH5UxnqMwZtBtiDNEk3OUcJRkCKMooZTkJOPNGZ53UMxmsGnyDioYiLxZIEY5yVBAOORR
6OLiFPTzgkPwiOgMdPxmhNLklCDcnIgd00ETnMMimqY4T+egSllOUAHTIbpx2Mmmdhwv8DzHsYnp
RlbgaH3bxLbpmIFrUi+062Hnbk0nw94eMnvHq3KrxpTz8oHj3DlwXMHN1iLi9t0+dTw76Fu2FZpG
6Dou1nSLRk5fhVnkVRPnD+wvX3/j9+8oV8+Uy6t3lV+/evXjy+XXlOW938SXH7+mfPg77bfK1Qef
PL70jx49Oj48Pnpy+PNm82lMUN11MS4QnzE0ZhER4UfgLZzA/u0h9MMMJAuRhxnLT+GUCgkM8YeE
wCfZJDHOIpgHSQZiolfboPg0BrUQi0wiyO8c+r7ghE5TMQMgjQgqAKJSAUg0kM8YX/ECOB7nbHoC
2YV0InKOx5OUDBA6fPgQHVZlXBCch3Gr1+u1oZ1kjQyHvco9OUSlTyv+E1ARFZixyiZwhq/ACUfo
BMamKFCgi+UM7aDVpI1QwYHNGNhKN1MC+LtT6xAKYTA43plXUZD4tw8tSd2vZlfzOANvgaag25HR
qdFnSZoilkFd54Kp6FkpsEsLZGAwj+HzZvBzEk1DIvLFZLoYpCKWOACditsW+uWU5AkkHroL0jFn
U4RzUlmsSCTZyToXAFSMMRDCYzbNOFh6wjh0eIxlc9c+3MwQDk/JhK9CDNuEJ4AEaEJa5gQk5FSE
sAO/rTgkYu/ANGNzEomwNQ6GgACrqAPLcSH+5IQidUxQWYp/9TSrBycMTJJSSFcRJ5TLmZlQGPc+
aq82LBzDIBfoiw2xjX26KQGDFvoEyRUgRcVB9VDugMp6Br0tp+0tHMSjSYohRwfX8PLB4q23xOoV
FL+33ri1hNwm71eaz5sN1BDV3JL44O7BNtM2ONhsbLgoRlnqJ9nuGpBw7f1GQ0Zz2+Uti1VUXozL
thCqF4q4a4hI3JqIW5jv7149ti86tSrsTalar1V44MNci6EmXsSAQxHZaj9fbJNfdFbntW/y29qR
GwO33ssCqLNK5/XdSa78FlLf2LWEHq5TX9UfjAq+e8VaJ2wdQBn4FZk7qqoqm1uCul3ut9TC68cZ
fb29k+fdYIlWe/85pKGu8mZjR+IABPabO70FT2XrSqZ6Hd2cwDsg2uimqg/ERVi0EwjeuM7lyjQ9
SqjrwpXcciKbWo7Vx55HImo7ru1SU77KmXp5RGkSJmIo1ZNF3iEQx3Ch0EGsOxwapWoY1cvsU/lG
Ohj8NIOCywucPuiX8D4MOjG8I8KLKnyZTpNIvLbspSzEaXU36IoXkNXrq2r2Sy90XVvzoi7xdKur
WbrWDTQ97AaebWp93QoDSoZto7wV3in3zuBCw/JdfNf9qviq7X0lucGuIwPpxv8BiyJGRA==
==== END SVK PATCH BLOCK ====