Bug #47005 for DBIx-Class: txn_do should provide a way to disable retry

Tue Jun 16 10:10:18 2009 TIMB [...] cpan.org - Ticket created

Subject:

txn_do should provide a way to disable retry

mst: right mst: maybe we should allow a hashref of flags before the coderef

Thu Aug 20 02:22:46 2009 ribasushi [...] leporine.io - Correspondence added

Stalling until Tim finds some time to get this going.

Thu Aug 20 02:22:47 2009 The RT System itself - Status changed from 'new' to 'open'

Thu Aug 20 02:22:48 2009 ribasushi [...] leporine.io - Status changed from 'open' to 'stalled'

Fri Oct 09 04:06:07 2009 ribasushi [...] leporine.io - Correspondence added

On Tue Jun 16 10:10:18 2009, TIMB wrote: Show quoted text

> mst: right > mst: maybe we should allow a hashref of flags before the coderef

It looks like a good time to resurrect this issue - David Wheeler factored out most of this code into DBIx::Connector[1] (and soon DBIC will switch over to it). As D::C provids *_do() with the same semantics as DBIC, this RT applies to it fully. Thoughts? [1] http://www.justatheory.com/computers/programming/perl/modules/dbix-connector.html

Fri Oct 09 04:06:08 2009 The RT System itself - Status changed from 'stalled' to 'open'

Fri Oct 09 04:51:42 2009 Tim.Bunce [...] pobox.com - Correspondence added

CC:	TIMB [...] cpan.org, "David E. Wheeler" <david [...] kineticode.com>
Subject:	Re: [rt.cpan.org #47005] txn_do should provide a way to disable retry
Date:	Fri, 9 Oct 2009 09:51:15 +0100
To:	Peter Rabbitson via RT <bug-DBIx-Class [...] rt.cpan.org>
From:	Tim Bunce <Tim.Bunce [...] pobox.com>

On Fri, Oct 09, 2009 at 04:06:11AM -0400, Peter Rabbitson via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=47005 > > > On Tue Jun 16 10:10:18 2009, TIMB wrote:

> > mst: right > > mst: maybe we should allow a hashref of flags before the coderef

> > It looks like a good time to resurrect this issue - David Wheeler > factored out most of this code into DBIx::Connector[1] (and soon DBIC > will switch over to it). As D::C provids *_do() with the same semantics > as DBIC, this RT applies to it fully. > > Thoughts?

I don't think provides *_do() with the same semantics as DBIC. It looks like D::C doesn't offer automatic retry functionality. (I'm happy about that.) I'd also be happy if David adds support for retries so long as it's not the default and the caller can provide the logic to decide if a retry should be performed. That way the possibility of retries is explicit. David, FYI this ticket relates to my observation that the automatic retry performed by DBIC's *_do() in some circumstances is dangerous. It's dangerous because, to be safe, it requires the code to be idempotent and that's a) easier said than done for non trivial cases, and b) likely to be forgotten as the code is maintained over time. Tim.

Fri Oct 09 11:17:53 2009 ribasushi [...] leporine.io - Correspondence added

RT-Send-CC:

david [...] kineticode.com

On Fri Oct 09 04:51:42 2009, Tim.Bunce@pobox.com wrote: Show quoted text

> > On Fri, Oct 09, 2009 at 04:06:11AM -0400, Peter Rabbitson via RT wrote:

> > <URL: https://rt.cpan.org/Ticket/Display.html?id=47005 > > > > > On Tue Jun 16 10:10:18 2009, TIMB wrote:

> > > mst: right > > > mst: maybe we should allow a hashref of flags before the coderef

> > > > It looks like a good time to resurrect this issue - David Wheeler > > factored out most of this code into DBIx::Connector[1] (and soon DBIC > > will switch over to it). As D::C provids *_do() with the same semantics > > as DBIC, this RT applies to it fully. > > > > Thoughts?

> > I don't think provides *_do() with the same semantics as DBIC. > It looks like D::C doesn't offer automatic retry functionality. > (I'm happy about that.) > > I'd also be happy if David adds support for retries so long as it's not > the default and the caller can provide the logic to decide if a retry > should be performed. That way the possibility of retries is explicit. > > David, FYI this ticket relates to my observation that the automatic > retry performed by DBIC's *_do() in some circumstances is dangerous. > > It's dangerous because, to be safe, it requires the code to be > idempotent and that's a) easier said than done for non trivial cases, > and b) likely to be forgotten as the code is maintained over time. > > Tim.

Well we can't both be right :) Grep Connector.pm[1] for the string: '# Not connected. Try again.' [1] http://cpansearch.perl.org/src/DWHEELER/DBIx-Connector-0.12/lib/DBIx/Connector.pm

Fri Oct 09 12:27:27 2009 Tim.Bunce [...] pobox.com - Correspondence added

CC:	TIMB [...] cpan.org
Subject:	Re: [rt.cpan.org #47005] txn_do should provide a way to disable retry
Date:	Fri, 9 Oct 2009 17:26:58 +0100
To:	Peter Rabbitson via RT <bug-DBIx-Class [...] rt.cpan.org>
From:	Tim Bunce <Tim.Bunce [...] pobox.com>

On Fri, Oct 09, 2009 at 11:17:54AM -0400, Peter Rabbitson via RT wrote: Show quoted text

> Well we can't both be right :) Grep Connector.pm[1] for the string: > '# Not connected. Try again.'

You're right. Don't know how I missed that. So yes, DBIx::Connector should be extended to give the user control over this behaviour. And I strongly believe automatic retries should be easy to enable, but should not be enabled by default. Tim.

Fri Oct 09 12:32:46 2009 dwheeler [...] cpan.org - Correspondence added

On Fri Oct 09 12:27:27 2009, Tim.Bunce@pobox.com wrote: Show quoted text

> You're right. Don't know how I missed that. So yes, DBIx::Connector > should be extended to give the user control over this behaviour. > > And I strongly believe automatic retries should be easy to enable, but > should not be enabled by default.

I've been thinking about this. I'm not sure how to proceed, really, because if the retry is disabled, there is no point to do(). Best, David

Fri Oct 09 15:50:09 2009 Tim.Bunce [...] pobox.com - Correspondence added

CC:	TIMB [...] cpan.org
Subject:	Re: [rt.cpan.org #47005] txn_do should provide a way to disable retry
Date:	Fri, 9 Oct 2009 20:44:41 +0100
To:	David Wheeler via RT <bug-DBIx-Class [...] rt.cpan.org>
From:	Tim Bunce <Tim.Bunce [...] pobox.com>

On Fri, Oct 09, 2009 at 12:32:47PM -0400, David Wheeler via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=47005 > > > On Fri Oct 09 12:27:27 2009, Tim.Bunce@pobox.com wrote: >

> > You're right. Don't know how I missed that. So yes, DBIx::Connector > > should be extended to give the user control over this behaviour. > > > > And I strongly believe automatic retries should be easy to enable, but > > should not be enabled by default.

> > I've been thinking about this. I'm not sure how to proceed, really, because if the retry is > disabled, there is no point to do().

Some options that spring to mind: - rename do() to do_with_retry() and txn_do() to txn_do_with_retry() - add arguments to new() to control retrying. Random observations: - Note that do() and txn_do() retry but svp_do() doesn't, which is reasonable in practical terms but inconsistent in naming terms. - The docs say "In the event of a failure **due** to a broken database connection, C<do()> will re-connect to the database and execute the code reference a second time." (my emphasis) but should say "In the event the code throws an exception the code will be rexecuted if the database handle is no longer connected." A subtle but significant difference. - I'd be very reluctant to rely on $dbh->{Active} as an indicator of connected-ness. $dbh->ping is the only guaranteed way. Tim.

Fri Oct 09 15:53:01 2009 ribasushi [...] leporine.io - Correspondence added

RT-Send-CC:

TIMB [...] cpan.org, david [...] kineticode.com

On Fri Oct 09 12:32:46 2009, DWHEELER wrote: Show quoted text

> On Fri Oct 09 12:27:27 2009, Tim.Bunce@pobox.com wrote: >

> > You're right. Don't know how I missed that. So yes, DBIx::Connector > > should be extended to give the user control over this behaviour. > > > > And I strongly believe automatic retries should be easy to enable,

> but

> > should not be enabled by default.

> > I've been thinking about this. I'm not sure how to proceed, really, > because if the retry is > disabled, there is no point to do(). >

This is not entirely true. _seems_connected() is virtually free - you can always call it before do/txn_do. This way do() without retry makes sense in forked environments - i.e. you reconnect before even calling the coderef. In fact we did something similar in DBIC with _get_dbh (this is in fact the reason _seems_connected was factored out). For txn_do() the benefits are even broader, as you are most likely going to use it for the nesting benefits, thus a single execution cycle can in fact be much desired. I side with Tim here - if it is not too late, making the auto-retry non-default would be sound design. In DBIC we would still turn it on by default but that's for backcompat reasons, which you (yet) don't have.

Fri Oct 09 15:54:39 2009 ribasushi [...] leporine.io - Correspondence added

On Fri Oct 09 15:50:09 2009, Tim.Bunce@pobox.com wrote: Show quoted text

> <snip> > - I'd be very reluctant to rely on $dbh->{Active} as an indicator of > connected-ness. $dbh->ping is the only guaranteed way. >

This is a pre-ping stage. The current assumption is that if {Active} is 0, then ping() can not possible succeed and is not even attempted.

Fri Oct 09 16:12:06 2009 ribasushi [...] leporine.io - Correspondence added

On Fri Oct 09 15:54:39 2009, RIBASUSHI wrote: Show quoted text

> On Fri Oct 09 15:50:09 2009, Tim.Bunce@pobox.com wrote:

> > <snip> > > - I'd be very reluctant to rely on $dbh->{Active} as an indicator of > > connected-ness. $dbh->ping is the only guaranteed way.

^^^^^^^^^^^^^^^^^^^^^^ In fact if you look at the e.g. Oracle special handling of ping() you'll see that even this is not the case :)

Fri Oct 09 18:13:19 2009 Tim.Bunce [...] pobox.com - Correspondence added

CC:	TIMB [...] cpan.org
Subject:	Re: [rt.cpan.org #47005] txn_do should provide a way to disable retry
Date:	Fri, 9 Oct 2009 23:12:49 +0100
To:	Peter Rabbitson via RT <bug-DBIx-Class [...] rt.cpan.org>
From:	Tim Bunce <Tim.Bunce [...] pobox.com>

On Fri, Oct 09, 2009 at 04:12:09PM -0400, Peter Rabbitson via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=47005 > > > On Fri Oct 09 15:54:39 2009, RIBASUSHI wrote:

> > On Fri Oct 09 15:50:09 2009, Tim.Bunce@pobox.com wrote:

> > > <snip> > > > - I'd be very reluctant to rely on $dbh->{Active} as an indicator of > > > connected-ness. $dbh->ping is the only guaranteed way.

> ^^^^^^^^^^^^^^^^^^^^^^ > In fact if you look at the e.g. Oracle special handling of ping() you'll > see that even this is not the case :)

Umm, not even a comment in the code to explain why ping isn't used. Tim.

Fri Oct 09 18:48:12 2009 dwheeler [...] cpan.org - Correspondence added

CC:	TIMB [...] cpan.org
Subject:	Re: [rt.cpan.org #47005] txn_do should provide a way to disable retry
Date:	Fri, 9 Oct 2009 15:47:47 -0700
To:	bug-DBIx-Class [...] rt.cpan.org
From:	"David E. Wheeler" <dwheeler [...] cpan.org>

On Oct 9, 2009, at 12:53 PM, Peter Rabbitson via RT wrote: Show quoted text

> This is not entirely true. _seems_connected() is virtually free - you > can always call it before do/txn_do. This way do() without retry makes > sense in forked environments - i.e. you reconnect before even calling > the coderef. In fact we did something similar in DBIC with _get_dbh > (this is in fact the reason _seems_connected was factored out). For > txn_do() the benefits are even broader, as you are most likely going > to > use it for the nesting benefits, thus a single execution cycle can in > fact be much desired.

It seems to me that if do() (and txn_do()) aren't going to retry, then they have to ping(), no? Show quoted text

> I side with Tim here - if it is not too late, making the auto-retry > non-default would be sound design. In DBIC we would still turn it on > by > default but that's for backcompat reasons, which you (yet) don't have.

I'm on the fence. David

Sat Oct 10 05:14:51 2009 ribasushi [...] leporine.io - Correspondence added

On Fri Oct 09 18:13:19 2009, Tim.Bunce@pobox.com wrote: Show quoted text

> On Fri, Oct 09, 2009 at 04:12:09PM -0400, Peter Rabbitson via RT wrote:

> > <URL: https://rt.cpan.org/Ticket/Display.html?id=47005 > > > > > On Fri Oct 09 15:54:39 2009, RIBASUSHI wrote:

> > > On Fri Oct 09 15:50:09 2009, Tim.Bunce@pobox.com wrote:

> > > > <snip> > > > > - I'd be very reluctant to rely on $dbh->{Active} as an indicator of > > > > connected-ness. $dbh->ping is the only guaranteed way.

> > ^^^^^^^^^^^^^^^^^^^^^^ > > In fact if you look at the e.g. Oracle special handling of ping() you'll > > see that even this is not the case :)

> > Umm, not even a comment in the code to explain why ping isn't used. >

You are right, and I can't find the pertaining ML discussion. Basically it came down to the fact that DBD::Oracle has some shutdown state in which it will return 1 on ping as long as the socket is still open. This however did not guarantee the server is any longer in a state to execute queries. So what happened was: 1) the weird state is reached 2) a txn_do takes place and fails on the first sql command 3) the code calls ping() and gets a connected reply 4) the txn_do is not retried 5) ... 6) users lose profit P.S. I never used Oracle, I just remember what the deal was. No idea if a bug was ever filed against the DBD or whatnot.

Sat Oct 10 06:08:32 2009 ribasushi [...] leporine.io - Correspondence added

On Fri Oct 09 18:48:12 2009, DWHEELER wrote: Show quoted text

> On Oct 9, 2009, at 12:53 PM, Peter Rabbitson via RT wrote: >

> > This is not entirely true. _seems_connected() is virtually free - you > > can always call it before do/txn_do. This way do() without retry makes > > sense in forked environments - i.e. you reconnect before even calling > > the coderef. In fact we did something similar in DBIC with _get_dbh > > (this is in fact the reason _seems_connected was factored out). For > > txn_do() the benefits are even broader, as you are most likely going > > to > > use it for the nesting benefits, thus a single execution cycle can in > > fact be much desired.

> > It seems to me that if do() (and txn_do()) aren't going to retry, then > they have to ping(), no? >

> > I side with Tim here - if it is not too late, making the auto-retry > > non-default would be sound design. In DBIC we would still turn it on > > by > > default but that's for backcompat reasons, which you (yet) don't have.

> > I'm on the fence. >

Ok,then more explanation is in order. Sorry I didn't bring this up earlier, leaky memory :( In fact I now realize that the API has a very serious deficiency, which I will try to highlight below. Also I am not trying to mock your module needlessly, I am just sharing all the thoughts that came up once I opened this can of worms. Still I think all of us can get out of this rather happier: What does D::C provide? - a service that allows the user not to care how did he get a specific $dbh. The two "atomic parts" that D::C uses are: *) Is said $dbh apparently usable (let's call this AU). It checks: - is there a $dbh in the first place - is it still in the same fork/thread - has it been closed by the *user* ({ACTIVE}) *) Is said $dbh *guaranteed* to be usable, at least for the SQL statement that immediately follows (let's call this GU). Check is either via ping() or with some equivalent kludge. Now, what potential users might expect: 1) You want to run some non-persistent code. You want to connect only if/when necessary, and want this to happen transparently, even if you fork somewhere along the way. You do not expect transactions nor any code-retries: This can be achieved without using a coderef, but with writing code directly around $conn. Nothing to see here. 2) You want to do the same as 1), but your code will run for a while, and there is a pretty good chance that the connection will timeout somewhere along the way. This is where you need a wrapper around the coderef - once an exception is thrown, it will run the GU code once, and retry only if the result is negative. However we now brokesome behavior of 1) - if the coderef never needs the $dbh, we just wasted cycles connecting. This would work if instead of $dbh, do() would pass $conn as its first attribute, or if you just used the coderef as a closure - e.g.: my $res = $conn->do_with_retry (sub { my $dbh = $conn->dbh; ... } ); This subtle difference allows the coderef to do whatever it wants, including never calling ->dbh(). Which is a substantial saving. Additionally every call to ->dbh() will run the AU code, and will re-connect if the last cached $dbh is deemed useless (i.e. a fork within the do_with_retry() coderef). Once an exception is caught a single GU check taks place and the code is retried if the results are negative. 3) You want to do the same as 1), but want to write things in a transaction. No retries expected. Here unlike with 1) we need a wrapper to issue the begin/commit statements. The concept is currently buggy in both DBIC and D::C. Suppose your coderef forks. At this point nothing protects the $dbh given to the user - he will just keep calling it, across processes. If we in turn switch to dbh() as a method call, we will have the benefit of die()ing right then an there, when it is detected that the process which started the transaction and the current one do in fact differ. 4) You want to txn_do_with_retry() - nothing special - again you run the AU code on every call to dbh(), an run the GU code as an indicator on whether to retry. Any time AU detects a process change a non-retriable exception is thrown, as the outermost transaction will no longer be atomic. 5) (bonus) I can't find the ML post, but there was once a user who wanted to do N retries per txn_do, not just 1. Granted it was a horrific hack, but a txn_do_with_retry argument would make this possible :) To sum it up: *) Passing a bare $dbh to the coderefs is a pessimization - the coderef may never need it. *) do() without retries is not very useful, but can be added for claity *) do_with_retry(), txn_do(), and txn_do_with_retry() each has its very legitimate uses. Sorry for the long write up :)

Mon Oct 12 15:04:27 2009 dwheeler [...] cpan.org - Correspondence added

Show quoted text

> You are right, and I can't find the pertaining ML discussion. Basically > it came down to the fact that DBD::Oracle has some shutdown state in > which it will return 1 on ping as long as the socket is still open.

I just added this information as a comment in DBIx::Connection::Driver::Oracle. —David

Mon Oct 12 15:14:02 2009 dwheeler [...] cpan.org - Correspondence added

2) Based on [this report](http://github.com/theory/dbix-connector/issues/#issue/4), I plan to change `dbh()` so that it does not do the `ping()` when called from within a `do*()` block. I think that this will address your concern. For shits and giggles, I'm also going to stick `$dbh` in `$_`, so that the blocks feel even more blockish. I'm going to work on these changes today. 3) The above-described change will help with this scenario too, as long as the user calls `$conn->dbh` for the situation you describe. But I still want to pass it in and assign it to `$_` for the simple cases, which I expect will be the most common. Does this make sense? Anyone who `fork`s within a transaction sub is asking for trouble, though. It's just insane. Show quoted text

> *) Passing a bare $dbh to the coderefs is a pessimization - the coderef > may never need it.

If the code ref never needs it, why would one call the blocks? They're specifically for accessing the database. What's the point otherwise? Show quoted text

> *) do() without retries is not very useful, but can be added for claity

Yes, the lack of utility was what I didn't like. Show quoted text

> *) do_with_retry(), txn_do(), and txn_do_with_retry() each has its very > legitimate uses.

Agreed. I'm going to make some changes now and think on it. More later. David

Mon Oct 12 15:15:03 2009 dwheeler [...] cpan.org - Correspondence added

Peter, can you re-assign this ticket to DBIx-Connector? I'm unable to do so. Thanks, David

Mon Oct 12 17:11:21 2009 dwheeler [...] cpan.org - Correspondence added

On Fri Oct 09 15:50:09 2009, Tim.Bunce@pobox.com wrote: Show quoted text

> - The docs say "In the event of a failure **due** to a broken database > connection, C<do()> will re-connect to the database and execute the > code > reference a second time." (my emphasis) but should say "In the event > the > code throws an exception the code will be rexecuted if the database > handle is no longer connected." A subtle but significant difference.

Thanks Tim, I changed it to your wording. —Theory

Mon Oct 12 18:18:34 2009 dwheeler [...] cpan.org - Correspondence added

Reading this more carefully: On Sat Oct 10 06:08:32 2009, RIBASUSHI wrote: Show quoted text

> 2) You want to do the same as 1), but your code will run for a while, > and there is a pretty good chance that the connection will timeout > somewhere along the way. This is where you need a wrapper around the > coderef - once an exception is thrown, it will run the GU code once, and > retry only if the result is negative. However we now brokesome behavior > of 1) - if the coderef never needs the $dbh, we just wasted cycles > connecting. This would work if instead of $dbh, do() would pass $conn as > its first attribute, or if you just used the coderef as a closure - e.g.: > > my $res = $conn->do_with_retry (sub { my $dbh = $conn->dbh; ... } ); > > This subtle difference allows the coderef to do whatever it wants, > including never calling ->dbh(). Which is a substantial saving.

I do not view this as the purpose of DBIx::Connector. If you have long-running code that doesn't touch the database, do *not* run it inside a `txn_do()` or `do()` block. Run it somewhere else. Now, you might have long-running code that *does* touch the datbase that you want to run inside a transaction block. Perhaps you want to have an entire HTTP request run inside a transaction. But really, you should only do this if the work you're doing is fast -- or if it's mostly database-intensive (and then only for updates). Otherwise, from a design POV, you should do your expensive computation before you enter the transaction. This will minimize the chance of deadlocks, as well. Show quoted text

> Additionally every call to ->dbh() will run the AU code, and will > re-connect if the last cached $dbh is deemed useless (i.e. a fork within > the do_with_retry() coderef). Once an exception is caught a single GU > check taks place and the code is retried if the results are negative.

With a commit I made today, if you only use `$conn->dbh` inside your blocks, you get that behavior. It's a choice. Show quoted text

> 3) You want to do the same as 1), but want to write things in a > transaction. No retries expected. Here unlike with 1) we need a wrapper > to issue the begin/commit statements. The concept is currently buggy in > both DBIC and D::C. Suppose your coderef forks. At this point nothing > protects the $dbh given to the user - he will just keep calling it, > across processes. If we in turn switch to dbh() as a method call, we > will have the benefit of die()ing right then an there, when it is > detected that the process which started the transaction and the current > one do in fact differ.

This is assuming no retry? Otherwise this too is dealt with in the commit I made today. Basically, if `dbh()` is called from within a `do()` or `txn_do()` block, it won't `ping()`, but will check for `fork`ing, threading, and the DBI `Active` flag. Show quoted text

> 4) You want to txn_do_with_retry() - nothing special - again you run the > AU code on every call to dbh(), an run the GU code as an indicator on > whether to retry. Any time AU detects a process change a non-retriable > exception is thrown, as the outermost transaction will no longer be atomic.

How is this different from 2)? Show quoted text

> 5) (bonus) I can't find the ML post, but there was once a user who > wanted to do N retries per txn_do, not just 1. Granted it was a horrific > hack, but a txn_do_with_retry argument would make this possible :)

Yeah, not really interested in that, at least not until we get all the other kinks worked out. Show quoted text

> To sum it up: > > *) Passing a bare $dbh to the coderefs is a pessimization - the coderef > may never need it.

I reject this assertion. If it doesn't need it, don't use `txn_do()`. If you think it may not need it becaus an exception will be thrown before it gets to using the database handle, well, in the vast majority of environments the database handle will be used sooner or later -- usually sooner. Maybe you don't need it on the current request, but you'll need it on the next request. So I don't view this as much of a loss. Show quoted text

> *) do() without retries is not very useful, but can be added for claity

Agreed. Show quoted text

> *) do_with_retry(), txn_do(), and txn_do_with_retry() each has its very > legitimate uses.

Yes, I can see that. So in addition to the changes I've already made today, I'm thinking of renaming `do()` and `txn_do()` to `do_with_retry()` and `txn_with_retry()` (or something better, if I can think of it), and then add `do()` and `txn_do()` with no retries. This should address the primary issue, yes? Best, David

Mon Oct 12 18:18:36 2009 dwheeler [...] cpan.org - Correspondence added