Skip Menu |

This queue is for tickets about the Class-DBI CPAN distribution.

Report information
The Basics
Id: 24959
Status: open
Priority: 0/
Queue: Class-DBI

People
Owner: Nobody in particular
Requestors: mschwern [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: [PATCH] Change the iterator to fetch on demand.
The attached patch changes Class::DBI::Iterator to take an active statement handle and fetch the rows on demand rather than working from a pre-fetched list. This can drastically improve the performance of iterators when the whole list is not iterated through. In particular the common idiom of, search->first when search might return a lot of rows. There's still some work and tweaks to be done, but I wanted to post it here so it can be evaluated. Notes: * All the existing methods still function the same. * The Iterator will still accept a data list. * In order to support reset() and count() it is necessary for the iterator to store the data being fetched. This wastes some memory. * count() remains as inefficient as before. The statement handle must fetch all the data to count it up. * CDBI->sth_to_objects() has been altered to pass in the statement handle to the iterator. This will break any existing CDBI::Iterator subclasses as it uses a new interface. * Because sth_to_objects() was tweaked this has a much wider performance effect than just search(). It should effect any search-like functionality including CDBI::AbstractSearch. * Its worth considering if CDBI::Iterator should be split into CDBI::Iterator::FromList and CDBI::Iterator::FromSth with CDBI::Iterator becoming abstract. * I had to turn off all cached statements or else DBI will bitch that a cached handle is being reused. This is not ideal. There is an option to DBI->prepare_cached() to make it do the right thing (ie. silently prepare a new handle if the cached one is still active) but it requires A) an Ima::DBI patch and B) Ima::DBI requiring DBI 1.40. I'll do it if that's ok. It would, in general, make set_sql() behave better. * The deprecated delete(@search) had to be deleted because of locking issues in SQLite. * Now that iterators contain active statement handles they may hold read locks open. This could cause new locking issues in existing applications. SQLite, for example, does not allow you to delete a row which is read locked. Note that this is no worse than having an active statement handle open and deleting the rows as you fetch them. * As a further performance tweak the execution of the statement handle can be delayed until the first fetch. This would also allow unused iterators to be safely created and passed around without worrying about holding locks open. Or even created opportunistically and then tossed away if they're not needed. I haven't implemented this.
Subject: iterator.patch
Auto-merging (0, 27397) /local/Class-DBI to /vendor/Class-DBI (base /vendor/Class-DBI:27388). U t/02-Film.t U lib/Class/DBI.pm U lib/Class/DBI/Iterator.pm ==== Patch <-> level 1 Source: 9c88509d-e914-0410-b01c-b9530614cbfe:/local/Class-DBI:27397 Target: 9c88509d-e914-0410-b01c-b9530614cbfe:/vendor/Class-DBI:27388 Log: Efficient iterator patch take 1 === t/02-Film.t ================================================================== --- t/02-Film.t (revision 27388) +++ t/02-Film.t (patch - level 1) @@ -4,7 +4,7 @@ BEGIN { eval "use DBD::SQLite"; - plan $@ ? (skip_all => 'needs DBD::SQLite for testing') : (tests => 91); + plan $@ ? (skip_all => 'needs DBD::SQLite for testing') : (tests => 88); } INIT { @@ -158,17 +158,6 @@ ok(Film->retrieve('Ishtar')->delete, "Ishtar doesn't deserve an entry any more"); ok(!Film->retrieve('Ishtar'), 'Ishtar no longer there'); - { - my $deprecated = 0; - local $SIG{__WARN__} = sub { $deprecated++ if $_[0] =~ /deprecated/ }; - ok( - Film->delete(Director => 'Elaine May'), - "In fact, delete all films by Elaine May" - ); - is(Film->search(Director => 'Elaine May')->count, - 0, "0 Films by Elaine May"); - is $deprecated, 1, "Got a deprecated warning"; - } }; is $@, '', "No problems with deletes"; === lib/Class/DBI.pm ================================================================== --- lib/Class/DBI.pm (revision 27388) +++ lib/Class/DBI.pm (patch - level 1) @@ -7,7 +7,7 @@ package Class::DBI; -use version; $VERSION = qv('3.0.16'); +use version; $VERSION = qv('3.0.16.1'); use strict; use warnings; @@ -730,7 +730,7 @@ sub delete { my $self = shift; - return $self->_search_delete(@_) if not ref $self; + $self->remove_from_object_index; $self->call_trigger('before_delete'); @@ -748,16 +748,6 @@ return 1; } -sub _search_delete { - my ($class, @args) = @_; - $class->_carp( - "Delete as class method is deprecated. Use search and delete_all instead." - ); - my $it = $class->search_like(@args); - while (my $obj = $it->next) { $obj->delete } - return 1; -} - # Return the placeholder to be used in UPDATE and INSERT queries. # Overriding this is deprecated in favour of # __PACKAGE__->find_column('entered')->placeholder('IF(1, CURDATE(), ?)); @@ -930,6 +920,7 @@ sub set_sql { my ($class, $name, $sql, $db, @others) = @_; $db ||= 'Main'; + @others = 0; $class->SUPER::set_sql($name, $sql, $db, @others); $class->_generate_search_sql($name) if $sql =~ /select/i; return 1; @@ -1130,19 +1121,19 @@ sub sth_to_objects { my ($class, $sth, $args) = @_; $class->_croak("sth_to_objects needs a statement handle") unless $sth; + unless (UNIVERSAL::isa($sth => "DBI::st")) { my $meth = "sql_$sth"; $sth = $class->$meth(); } - my (%data, @rows); + eval { $sth->execute(@$args) unless $sth->{Active}; - $sth->bind_columns(\(@data{ @{ $sth->{NAME_lc} } })); - push @rows, {%data} while $sth->fetch; }; return $class->_croak("$class can't $sth->{Statement}: $@", err => $@) if $@; - return $class->_ids_to_objects(\@rows); + + return $class->_sth_to_iterator($sth); } *_sth_to_objects = \&sth_to_objects; @@ -1153,9 +1144,36 @@ return $class; } +sub _sth_to_iterator { + my($class, $sth) = @_; + + return unless defined wantarray; + return $class->_objects_from_sth($sth) if wantarray; + + my $iterator = $class->_my_iterator->new($class); + $iterator->sth($sth); + return $iterator; +} + +sub _data_from_sth { + my($class, $sth) = @_; + + my( @rows, %data ); + $sth->bind_columns(\(@data{ @{ $sth->{NAME_lc} } })); + push @rows, {%data} while $sth->fetch; + + return \@rows; +} + +sub _objects_from_sth { + my($class, $sth) = @_; + + return map $class->construct($_), @{$class->_data_from_sth($sth)}; +} + sub _ids_to_objects { my ($class, $data) = @_; - return $#$data + 1 unless defined wantarray; + return unless defined wantarray; return map $class->construct($_), @$data if wantarray; return $class->_my_iterator->new($class => $data); } === lib/Class/DBI/Iterator.pm ================================================================== --- lib/Class/DBI/Iterator.pm (revision 27388) +++ lib/Class/DBI/Iterator.pm (patch - level 1) @@ -30,6 +30,17 @@ can fetch them one at a time, potentially saving a considerable amount of processing time and memory. +=head1 EFFICIENCY + +The iterator has two modes of operation. One is to work off of a list of data handed into new(). This can be very wasteful if the entire data list is not iterated through, for example: C<< Class->search(...)->first >>. + +The other mode is to pass in no data and instead give it an active, executed statement handle. + + my $iterator = CDBI::Iterator->new($cdbi_class); + $iterator->sth($search_sth); + +In this mode, the iterator will only read from the statement handle on demand. This can reduce a lot of overhead for large queries where you are only iterating through a small amount. Note that in this mode the statement handle is kept active until all the data is fetched or the iterator is destroyed. + =head1 CAVEAT Note that there is no provision for the data changing (or even being @@ -50,10 +61,20 @@ _data => $data, _mapper => [@mapper], _place => 0, + _sth => undef, }, ref $me || $me; } +sub sth { + my $self = shift; + + if( @_ ) { + $self->{_sth} = shift; + } + return $self->{_sth}; +} + sub set_mapping_method { my ($self, @mapper) = @_; $self->{_mapper} = [@mapper]; @@ -61,9 +82,53 @@ } sub class { shift->{_class} } -sub data { @{ shift->{_data} } } sub mapper { @{ shift->{_mapper} } } +sub next_data { + my $self = shift; + my $place = $self->{_place}++; + + if( !defined $self->{_data}[$place] + and (my $sth = $self->{_sth}) ) + { + $self->_fill_in_data_from_sth($place); + return $self->{_data}[$place]; + } + + return $self->{_data}[$place]; +} + +sub _sth_next { + my $self = shift; + + my $sth = $self->{_sth}; + return unless $sth->{Active}; + + my $row = $sth->fetchrow_hashref; + return unless $row; + + push @{$self->{_data}}, $row; + return $row; +} + +sub _fill_in_data_from_sth { + my($self, $place) = @_; + + while( $#{$self->{_data}} < $place ) { + last unless defined $self->_sth_next; + } +} + +sub data { + my $self = shift; + + if( my $sth = $self->{_sth} ) { + $self->_fill_in_data_from_sth("Inf"); + } + + return @{$self->{_data} || []}; +} + sub count { my $self = shift; $self->{_count} ||= scalar $self->data; @@ -71,7 +136,7 @@ sub next { my $self = shift; - my $use = $self->{_data}->[ $self->{_place}++ ] or return; + my $use = $self->next_data or return; my @obj = ($self->class->construct($use)); foreach my $meth ($self->mapper) { @obj = map $_->$meth(), @obj; @@ -107,7 +172,10 @@ while (my $obj = $self->next) { $obj->delete; } + $self->{_data} = []; + $self->{_sth} = undef; + 1; } ==== BEGIN SVK PATCH BLOCK ==== Version: svk v2.0.0 (darwin) eJyNV92P41YVt1qxS0IREs9IXFqvmtAk48/YzuxEKdNZaVTtgmApQt2Ve23fOzbj+Ab7ZjLROOoE OhVQ8cALQkJFvLTwxht/Aa+o/0wF4o1zrxNPkp2ZrWY0k/ie8zu/8339KH+6P9LL4VArVV0rf/Le u4PBjzAP4we6Var9kkQJZ7lqlyk5I6lqlik7Ua0yw2MCpwWb5qH4wHF+Qrj4kISnhA+HOsC5FdyR hFjDStQAc5YVqifhfZ4TouqlO/LKkSl+fVX3yoLAiYT1c3KWFAnLgIbhmJ4DIiCvgz6bkMzPGePy yNKNkSG0tTJMWUF8AS8gbSFvqOCSVIiSnITAaQ5PuURaa0tB6wZBcDwJtkQlTXtlaBNRBw4VkLlm SJMUPARje5rRfZSk4x6XhF6KZVRY8hRPJunc5+ScRyTlWOKbRqlZlBqRZ1qObZo60Wlka2FfpwZ2 QifEkWqIPDxRlC/8f3/7W68oy28ov1I+u7f8pvL5h3//wHVH1mbEKqYVMvZ0jA3XM6hmBrZO+zYO LY861LRtg7ph7ad9Q8A8EbC9wxQXhQzpS1216kT1t8PWv0bae+cHx73JWJrcSplzU27NbcUtrVtp 2HdGvC/ioruBZmu64xCq2wEhEG/bjjwjcPtaSGigevoq5B//VfwOrvrKcgRRX/6xI/5Pl3/6rvK3 7y8/vfeJ8o/z5b+6Cll+qf5T+dnyP+bnyrK3/K92+dFj5aPPni//952ePmI8JnmBDpDWbJAznKKL ZqOhFjzuDsk5CaectEYqdErRRtMsJUWBqsOLt0OenJFFs5ETPs0zpIYiFt2hD8c+Z37CSQ6dmLeE /M4zYWQ8b1UqHYnYBgojf79Z462sRYQmGYnQDGfQsDme779okQW/gBAXPs3ZWJhvVYAJ3dISJpFa Uzi41h/Pa2bdYUZmK2JtUFKvD2rgDQbr0/3motkspgHyI8xxTeQOPyWdFhrlbAYnD4QakgZldIMk i/yQpdNxVrSetUbi+AKNLtbBf/L24yM/DRcIftpCbTIt4jXYhURboFkMJb7SoATmo7S6ov5MCl/z 3o3h3dRXIGM8qaMYwtDl+TTkLdVvd4BrHd6tkFQhXFwbTqJClMbKfmUWXdsVyrXhyixU/Z/LT1+9 v7x/ubwPH5UxnqMwZtBtiDNEk3OUcJRkCKMooZTkJOPNGZ53UMxmsGnyDioYiLxZIEY5yVBAOORR 6OLiFPTzgkPwiOgMdPxmhNLklCDcnIgd00ETnMMimqY4T+egSllOUAHTIbpx2Mmmdhwv8DzHsYnp RlbgaH3bxLbpmIFrUi+062Hnbk0nw94eMnvHq3KrxpTz8oHj3DlwXMHN1iLi9t0+dTw76Fu2FZpG 6Dou1nSLRk5fhVnkVRPnD+wvX3/j9+8oV8+Uy6t3lV+/evXjy+XXlOW938SXH7+mfPg77bfK1Qef PL70jx49Oj48Pnpy+PNm82lMUN11MS4QnzE0ZhER4UfgLZzA/u0h9MMMJAuRhxnLT+GUCgkM8YeE wCfZJDHOIpgHSQZiolfboPg0BrUQi0wiyO8c+r7ghE5TMQMgjQgqAKJSAUg0kM8YX/ECOB7nbHoC 2YV0InKOx5OUDBA6fPgQHVZlXBCch3Gr1+u1oZ1kjQyHvco9OUSlTyv+E1ARFZixyiZwhq/ACUfo BMamKFCgi+UM7aDVpI1QwYHNGNhKN1MC+LtT6xAKYTA43plXUZD4tw8tSd2vZlfzOANvgaag25HR qdFnSZoilkFd54Kp6FkpsEsLZGAwj+HzZvBzEk1DIvLFZLoYpCKWOACditsW+uWU5AkkHroL0jFn U4RzUlmsSCTZyToXAFSMMRDCYzbNOFh6wjh0eIxlc9c+3MwQDk/JhK9CDNuEJ4AEaEJa5gQk5FSE sAO/rTgkYu/ANGNzEomwNQ6GgACrqAPLcSH+5IQidUxQWYp/9TSrBycMTJJSSFcRJ5TLmZlQGPc+ aq82LBzDIBfoiw2xjX26KQGDFvoEyRUgRcVB9VDugMp6Br0tp+0tHMSjSYohRwfX8PLB4q23xOoV FL+33ri1hNwm71eaz5sN1BDV3JL44O7BNtM2ONhsbLgoRlnqJ9nuGpBw7f1GQ0Zz2+Uti1VUXozL thCqF4q4a4hI3JqIW5jv7149ti86tSrsTalar1V44MNci6EmXsSAQxHZaj9fbJNfdFbntW/y29qR GwO33ssCqLNK5/XdSa78FlLf2LWEHq5TX9UfjAq+e8VaJ2wdQBn4FZk7qqoqm1uCul3ut9TC68cZ fb29k+fdYIlWe/85pKGu8mZjR+IABPabO70FT2XrSqZ6Hd2cwDsg2uimqg/ERVi0EwjeuM7lyjQ9 SqjrwpXcciKbWo7Vx55HImo7ru1SU77KmXp5RGkSJmIo1ZNF3iEQx3Ch0EGsOxwapWoY1cvsU/lG Ohj8NIOCywucPuiX8D4MOjG8I8KLKnyZTpNIvLbspSzEaXU36IoXkNXrq2r2Sy90XVvzoi7xdKur WbrWDTQ97AaebWp93QoDSoZto7wV3in3zuBCw/JdfNf9qviq7X0lucGuIwPpxv8BiyJGRA== ==== END SVK PATCH BLOCK ====
Here's the next iteration of the iterator class. The last one had performance problems and only allowed one iterator. This one creates two iterators: PreFetch, which is the current iterator; and OnDemand which fetches on demand. PreFetch remains the default. The iterator interface has changed, but only undocumented portions. * new() now takes a hash * mapper() is now mappers() and set_mapping_method() is now set_mapping_methods() to reflect that they take multiple mappers. Class::DBI now gives an iterator a prepared but inactive statement handle, plus the bind params, to give the iterator maximum flexibility. Iterators can also accept a list of already fetched rows. Performance of the OnDemand iterator is now approaching PreFetch when fetching all rows, I'm working on closing the gap.

Message body is not shown because it is too large.

This should be the final version of the iterator patch. The performance of the OnDemand iterator has been improved to where its only 10-20% slower than the default PreFetch when fetching all objects. The OnDemand iterator caches and will turn itself into a PreFetch iterator in situations where its going to have to fetch all the rows anyway (such as count()). A new iterator was added, OnDemand::Opportunistic (yeah, not the best name). This starts as an OnDemand iterator and then after fetching 100 (default) rows it turns itself into a PreFetch iterator. This allows it to have the performance of OnDemand when you only fetch out a few rows from a large query but also the performance of PreFetch when you fetch out lots. It should match or beat PreFetch's performance in all cases. Its worth considering using this as the CDBI default iterator. Finally, taking advantage on changes to statement handle caching in Ima::DBI (see 24959) statement handle caching has been turned back on for all statements. Only the individual call to sql_* to get the handle passed to the iterator is not cached to avoid two iterators having the same handle.

Message body is not shown because it is too large.

On Wed Feb 21 21:11:04 2007, MSCHWERN wrote: Show quoted text
> Finally, taking advantage on changes to statement handle caching in > Ima::DBI (see 24959)
Make that 25073.
One last revision. This one makes two changes. * All three iterators are tested by 21-iterator.t * The iterators take care to finish their statement handles on DESTROY to avoid leaving an active statement handle around maybe cached somewhere. This probably isn't necessary as these statement handles are not cached by DBI any more, but I'm not taking the chance.

Message body is not shown because it is too large.