Bug #69714 for DBIx-Class: ResultSet update_all updated rows not reflected in ResultSet, can "lose" rows

Fri Jul 22 23:31:47 2011 zeta [...] tcscs.com - Ticket created

Subject:

ResultSet update_all updated rows not reflected in ResultSet, can "lose" rows

ResultSet update_all() fetches each row object and updates it. However, the row in the ResultSet is not updated. If one of the updated fields happens to be used in the where clause to create the ResultSet then attempt to read from result set either returns empty data (if no more matching rows) or different rows. In short, calling update_all causes you to lose the rows in the result set. The attached file shows a test program that illustrates this, including the SQL executed and the output generated. The first test uses update_all and loses all the rows. The second test does the same thing manually and shows the rows getting updated, but the result set loses them after the update. The update() documentation kind of implies the contents of the result set are not updated. It also "loses" the rows if one of the fields in where is changed, however it implies using update_all will update the rows objects. Not sure if it's a bug in code or bug (or ambiguity) in the docs.

Subject:

dbix-bug.pl

#!/usr/bin/perl use strict; use warnings; use DBIx::Class; use DBIx::Class::Schema::Loader; use Bugtest::Schema; $|++; DBIx::Class::Schema::Loader->naming('v7'); my $schema = Bugtest::Schema->connect( 'dbi:mysql:bugtest', 'bugtest', 'passw0rd123$', { RaiseError => 1, PrintError => 1, }, ); print "Test #1\n"; $resultset = $schema->resultset('BugTest')->search( { 'field2' => 'test', 'field6' => undef, 'field5' => 2 }, { 'for' => 'update', 'order_by' => { '-asc' => 'bug_test_id' } } ); $update = $resultset->update_all({ field5 => 3 }); # 506 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( ( field2 = 'test' AND field6 IS NULL AND field5 = '2' ) ) ORDER BY bug_test_id ASC FOR UPDATE # 506 Query UPDATE bug_test SET field5 = '3' WHERE ( bug_test_id = '1' ) # 506 Query UPDATE bug_test SET field5 = '3' WHERE ( bug_test_id = '2' ) # 506 Query UPDATE bug_test SET field5 = '3' WHERE ( bug_test_id = '3' ) # doesn't make a difference if this is commented or not #$resultset->reset(); while (my $row = $resultset->next) { printf("Id: %s; Field5: %s\n", $row->bug_test_id, $row->field5); } # 506 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( ( field2 = 'test' AND field6 IS NULL AND field5 = '2' ) ) ORDER BY bug_test_id ASC FOR UPDATE print "reset data\n"; sleep 15; # allow me time to reset the data print "Test #2\n"; $resultset = $schema->resultset('BugTest')->search( { 'field2' => 'test', 'field6' => undef, 'field5' => 2 }, { 'for' => 'update', 'order_by' => { '-asc' => 'bug_test_id' } } ); while (my $row = $resultset->next) { printf("Id: %s; Field5: %s\n", $row->bug_test_id, $row->field5); my $ok = $row->update({ field5 => 3 }); printf("Id: %s; Field5: %s; OK: %s\n", $row->bug_test_id, $row->field5, $ok); print "Refreshing\n"; $row->discard_changes(); printf("Id: %s; Field5: %s\n", $row->bug_test_id, $row->field5); } #110722 19:41:16 508 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( ( field2 = 'test' AND field6 IS NULL AND field5 = '2' ) ) ORDER BY bug_test_id ASC FOR UPDATE # 508 Query UPDATE bug_test SET field5 = '3' WHERE ( bug_test_id = '1' ) # 508 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( me.bug_test_id = '1' ) # 508 Query UPDATE bug_test SET field5 = '3' WHERE ( bug_test_id = '2' ) # 508 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( me.bug_test_id = '2' ) # 508 Query UPDATE bug_test SET field5 = '3' WHERE ( bug_test_id = '3' ) # 508 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( me.bug_test_id = '3' ) $resultset->reset(); while (my $row = $resultset->next) { printf("Id: %s; Field5: %s\n", $row->bug_test_id, $row->field5); } # 508 Query SELECT me.bug_test_id, me.field2, me.field3, me.field4, me.field5, me.field6 FROM bug_test me WHERE ( ( field2 = 'test' AND field6 IS NULL AND field5 = '2' ) ) ORDER BY bug_test_id ASC FOR UPDATE #### output #Test #1 #reset data #Test #2 #Id: 1; Field5: 2 #Id: 1; Field5: 3; OK: Bugtest::Schema::Result::BugTest=HASH(0x10207e0d0) #Refreshing #Id: 1; Field5: 3 #Id: 2; Field5: 2 #Id: 2; Field5: 3; OK: Bugtest::Schema::Result::BugTest=HASH(0x10207f2c0) #Refreshing #Id: 2; Field5: 3 #Id: 3; Field5: 2 #Id: 3; Field5: 3; OK: Bugtest::Schema::Result::BugTest=HASH(0x10207b178) #Refreshing #Id: 3; Field5: 3

Mon Jul 25 03:16:53 2011 ribasushi [...] leporine.io - Correspondence added

On Fri Jul 22 23:31:47 2011, ZETA wrote: Show quoted text

> ResultSet update_all() fetches each row object and updates it. > However, the row in the > ResultSet is not updated. If one of the updated fields happens to be > used in the where clause > to create the ResultSet then attempt to read from result set either > returns empty data (if no > more matching rows) or different rows. In short, calling update_all > causes you to lose the rows > in the result set.

There are a couple misconceptions in the statement above, however I fail to see what in the docs confused you. A resultset does not contain neither rows nor objects. A resultset is a "query plan" representation, the stuff between SELECT and the end of the statement. When "executed" (via ->next/->all) a cursor object is attached to the resultset object which (cursor) is a thin wrapper around DBI's $sth. ->reset merely serves to destroy the cursor and re-query on the next access again. At no point does the resultset itself collect objects (unless you use cache => 1, but that's not relevant here). In addition a central design decision is that resultsets are immutable. As such changing the query plan by a mere ->update() statement, would violate all kinds of API contracts. So to sum it up - an update/update_all operation is carried out on rows matching the condition set forth by a specific resultset. A resultset's pre-existing condition is never altered. It can only be added to/constrained further, by creating a new resultset object via ->search. I will keep this ticked open for some days in case you have further questions, but all in all everything in your test behaves as intended. Cheers Show quoted text

> The attached file shows a test program that illustrates this, > including the SQL executed and > the output generated. The first test uses update_all and loses all the > rows. The second test > does the same thing manually and shows the rows getting updated, but > the result set loses > them after the update. > > The update() documentation kind of implies the contents of the result > set are not updated. It > also "loses" the rows if one of the fields in where is changed, > however it implies using > update_all will update the rows objects. > > Not sure if it's a bug in code or bug (or ambiguity) in the docs.

Mon Jul 25 03:16:56 2011 The RT System itself - Status changed from 'new' to 'open'

Mon Jul 25 12:43:58 2011 zeta [...] tcscs.com - Correspondence added

On Mon Jul 25 03:16:53 2011, RIBASUSHI wrote: Show quoted text

> There are a couple misconceptions in the statement above, however I fail > to see what in the docs confused you.

I read this, under Update: Note that this will not run any accessor/set_column/update triggers, nor will it update any row object instances derived from this resultset (this includes the contents of the resultset cache if any). See "update_all" if you need to execute any on-update triggers or cascades defined either by you or a result component. and interpreted the "See 'update_all'" to apply to the previous sentence's "nor will it update any row object" as means to achieve what I was hoping to do. Show quoted text

> > A resultset does not contain neither rows nor objects. A resultset is a > "query plan" representation, the stuff between SELECT and the end of the > statement. When "executed" (via ->next/->all) a cursor object is > attached to the resultset object which (cursor) is a thin wrapper around > DBI's $sth. ->reset merely serves to destroy the cursor and re-query on > the next access again.

I can see I misinterpreted what a result set was as well. I was thinking a ResultSet was more stable, and that once I had a result set I could work on it and trust that it would not change. As it stands, it's only stable for one iteration. As soon as you walk it, if you try to use reset to walk it again you are getting a new result set that may or may not contain the same rows as the previous iteration. I can certainly understand how reset would reapply the criteria and create a new result set, resetting the result set with new data. However, is there an alternate way to restart at the beginning? So that you could walk a result set more than once without it rerunning the query and creating a new result set, at least short of storing the rows from the result set in memory? Show quoted text

> At no point does the resultset itself collect > objects (unless you use cache => 1, but that's not relevant here). > In addition a central design decision is that resultsets are immutable. > As such changing the query plan by a mere ->update() statement, would > violate all kinds of API contracts. > > So to sum it up - an update/update_all operation is carried out on rows > matching the condition set forth by a specific resultset. A resultset's > pre-existing condition is never altered. It can only be added > to/constrained further, by creating a new resultset object via ->search. > > I will keep this ticked open for some days in case you have further > questions, but all in all everything in your test behaves as intended. >

Thanks for the detailed answers. Might as well close this since it's a not an issue. Thanks, Greg

Tue Jul 26 02:50:16 2011 ribasushi [...] leporine.io - Correspondence added

On Mon Jul 25 12:43:58 2011, ZETA wrote: Show quoted text

> On Mon Jul 25 03:16:53 2011, RIBASUSHI wrote:

> > There are a couple misconceptions in the statement above, however I

> fail

> > to see what in the docs confused you.

Ok, there are still some misconceptions in your answer, therefore I will explain more. However please please please consider contributing clearer documentation on the subject matter. People who just "clicked" about a specific set of issues are the best ones to write documentation, since for the authors everything that I write below is "bloody obvious" and not worthy of passing mention. It is easy too - all you need to do is go to github, clone our repo, and click 'edit' on any file you want to change, entering your doc-changes DIRECTLY IN THE BROWSER. No fussing with git or checkouts or pull requests - couple clicks and you write some POD. So thanks in advance for that and read below for the rest of the answer :) Show quoted text

> I read this, under Update: > > Note that this will not run any accessor/set_column/update triggers, > nor will it update any > row object instances derived from this resultset (this includes the > contents of the resultset > cache if any). See "update_all" if you need to execute any on-update > triggers or cascades > defined either by you or a result component.

Right. Consider you using some sort of "on update trigger" role, e.g. DBIx::Class::TimeStamp. If you do a $rs->update({ foo => 'bar' }), then the set_on_update triggers will never fire. Only update_all (which involves fetching the row, inflating an object and then calling ->update on the *row object*) will invoke the role hooks as you expect. This is due to the fact that the role is only capable of overriding &{$result_class}::update(), it has no way of also hooking &{$resultset_class}::update(). This is what the blurb above is about. Show quoted text

> and interpreted the "See 'update_all'" to apply to the previous > sentence's "nor will it update > any row object" as means to achieve what I was hoping to do. >

> > > > A resultset does not contain neither rows nor objects. A resultset

> is a

> > "query plan" representation, the stuff between SELECT and the end of

> the

> > statement. When "executed" (via ->next/->all) a cursor object is > > attached to the resultset object which (cursor) is a thin wrapper

> around

> > DBI's $sth. ->reset merely serves to destroy the cursor and re-query

> on

> > the next access again.

> > I can see I misinterpreted what a result set was as well. I was > thinking a ResultSet was more > stable, and that once I had a result set I could work on it and trust > that it would not change. > As it stands, it's only stable for one iteration. As soon as you walk > it, if you try to use reset to > walk it again you are getting a new result set that may or may not > contain the same rows as > the previous iteration.

Well, yes and no. If you re-query the resultset in a transaction (with the proper transaction isolation set, i.e. REPEATABLE READ) you will get the same set of rows every time until you are out of the txn. Of course this *does* involve querying the storage every time. Show quoted text

> I can certainly understand how reset would reapply the criteria and > create a new result set, > resetting the result set with new data. However, is there an alternate > way to restart at the > beginning? So that you could walk a result set more than once without > it rerunning the query > and creating a new result set, at least short of storing the rows from > the result set in > memory?

Yes, there are 2 caching levels available to you. One is http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/ResultSet.pm#cache which simply stores final inflated row objects in the "cache" slot of a resultset instance ($rs). Then ->reset/->next and ->all will dtrt and will retrieve the same objects you just saw the loop before. Incidentally this is also the mechanism used by prefetch - dbic creates related resultsets with the related row-objects alredy pre-stuffed in the cache's of these resultsets. So when you do $artist->cds->all all you do is reach down the cache of $artist->related_resultset('cds'). The second type of caching is lower down on the cursor level via DBIx::Class::Cursor::Cached. Unlike the resultset caching mechanism, it does not offer a ready made "storage slot" - you need to provide it a storage on your own (memcached being the most popular). What it does is derive a hash from the query parameters (sqlt + bind values) and then caches the result coming from the database under this query-parameter-key. Once you hit ->rest/->next or ->all, dbic does a requery as usual, but the cursor derives its raw data from the cache, and passes it down to dbic for the same round of result parsing/row object inflation. The chief difference is that with the resultset cache you get the same physical objects back every time (same blessed hash ref, same address), while with the cached cursor you only cache the data, but get new objects every time you iterate over the resultset (this is closer to general dbic semantics). Show quoted text

> > At no point does the resultset itself collect > > objects (unless you use cache => 1, but that's not relevant here). > > In addition a central design decision is that resultsets are

> immutable.

> > As such changing the query plan by a mere ->update() statement,

> would

> > violate all kinds of API contracts. > > > > So to sum it up - an update/update_all operation is carried out on

> rows

> > matching the condition set forth by a specific resultset. A

> resultset's

> > pre-existing condition is never altered. It can only be added > > to/constrained further, by creating a new resultset object via

> ->search.

> > > > I will keep this ticked open for some days in case you have further > > questions, but all in all everything in your test behaves as

> intended.

> >

> > Thanks for the detailed answers. Might as well close this since it's a > not an issue.

Well, clearly we do have a documentation issue, hence I will keep the ticket open a while longer, hoping you will answer the plea in the beginning :) Cheers

Wed Aug 31 23:03:41 2011 ribasushi [...] leporine.io - Correspondence added

Stalling in anticipation of documentation patches.

Wed Aug 31 23:03:42 2011 ribasushi [...] leporine.io - Status changed from 'open' to 'stalled'