Bug #125344 for DBIx-Class: Document and unhide DBIx::Class::Storage::BlockRunner

Thu May 17 20:33:43 2018 BBYRD [...] cpan.org - Ticket created

Subject:

Document and unhide DBIx::Class::Storage::BlockRunner

This module is the cornerstone of retry support within DBIC. It would be useful to expose this to the public, to use in cases where one would want to retry more than once, or change the retry_handler to use different conditions. If I wrote a patch to document the module and unhide it from PAUSE, would you be receptive to merging it in? What about creating and publicizing attributes within Storage and Storage::DBI to use the same BlockRunner objects created in dbh_do / txn_do ?

Thu May 17 21:41:00 2018 ribasushi [...] leporine.io - Correspondence added

Subject:	Re: [rt.cpan.org #125344] Document and unhide DBIx::Class::Storage::BlockRunner
Date:	Fri, 18 May 2018 03:40:38 +0200
To:	bug-DBIx-Class [...] rt.cpan.org
From:	Peter Rabbitson <ribasushi [...] leporine.io>

On 05/18/2018 02:33 AM, Brendan Byrd via RT wrote: Show quoted text

> > This module is the cornerstone of retry support within DBIC. It would be useful to expose this to the public, to use in cases where one would want to retry more than once, or change the retry_handler to use different conditions.

Ideally - yes you are correct that it should get publicized. Show quoted text

> > If I wrote a patch to document the module and unhide it from PAUSE, would you be receptive to merging it in? >

Not at present. There are several grave errors in its design, mainly around the surface API interfacing with ::Storage and the handling of stack-independent txn frames ( txn_scope_guard ). This is the reason it remains hidden ( as the very first line of the module says ). With that said - I am definitely going to open this module at some point in the future, so having ready-to-go documentation parked in a branch definitely wouldn't hurt. I just want to clear up-front that the merging part will happen "when it is ready". Show quoted text

> What about creating and publicizing attributes within Storage and Storage::DBI to use the same BlockRunner objects created in dbh_do / txn_do ?

Could you elaborate on this point? I am not entirely sure what you meant, just a hunch. Pseudocode would be best! Thank you for filing the issue in any case - gives me more info on how to prioritize this.

Thu May 17 21:41:01 2018 The RT System itself - Status changed from 'new' to 'open'

Fri May 18 12:49:41 2018 BBYRD [...] cpan.org - Correspondence added

On Thu May 17 21:41:00 2018, RIBASUSHI wrote: Show quoted text

> Not at present. There are several grave errors in its design, mainly > around the surface API interfacing with ::Storage and the handling of > stack-independent txn frames ( txn_scope_guard ). This is the reason it > remains hidden ( as the very first line of the module says ).

Can you elaborate? The handling of transactions seems pretty reasonable to me: * If starting a transaction via wrap_txn, then handle rollbacks and never trust the connection on error. * If already in a transaction or block, run far far away because BlockRunner can't possibly know what transactionalized statements are before it that might be missed on disconnection rollback. (Also, the outer-layer BlockRunner with wrap_txn=1 will handle the retry logic, anyway.) Show quoted text

> With that said - I am definitely going to open this module at some point > in the future, so having ready-to-go documentation parked in a branch > definitely wouldn't hurt. I just want to clear up-front that the merging > part will happen "when it is ready".

Part of the reason for asking is because I'm finishing up a DBIx::Connector::Retry module for the non-DBIC side (pending review and paperwork), and already have a chunk of documentation that would fit here. Another more selfish reason is that I'm also adding retry protection into DBIx::BatchChunker and another new module, so it would be nice to reference a public interface for that, instead of secretly using BlockRunner for increased retries. Yes, I take full responsibility for the risks of involved in abusing private interfaces, but it seemed better than trying to roll our own that looks exactly like BlockRunner. Show quoted text

> > What about creating and publicizing attributes within Storage and > > Storage::DBI to use the same BlockRunner objects created in dbh_do / > > txn_do ?

> > Could you elaborate on this point? I am not entirely sure what you > meant, just a hunch. Pseudocode would be best!

I was going to go the route of a 'block_runner' attribute. But, perhaps a more incremental approach would just be 'block_runner_args': package DBIx::Class::Storage; __PACKAGE__->mk_group_accessors('simple' => 'block_runner_args'); # or maybe some other accessor type... __PACKAGE__->block_runner_args({ retry_handler => sub { $_[0]->failed_attempt_count == 1 and ! $_[0]->storage->connected }, }); sub txn_do { my $self = shift; DBIx::Class::Storage::BlockRunner->new( %{ $self->block_runner_args }, storage => $self, wrap_txn => 1, )->run(@_); } Then anybody could specify something like this: $storage->block_runner_args({ retry_handler => sub { ...whatever you want... }, max_attempts => 10, }); Show quoted text

> Thank you for filing the issue in any case - gives me more info on how > to prioritize this.

It's our itch to scratch, so I'd be happy to help, but I just wanted to make sure we could hash out the implementation details.

Fri May 18 13:41:04 2018 ribasushi [...] leporine.io - Correspondence added

Subject:	Re: [rt.cpan.org #125344] Document and unhide DBIx::Class::Storage::BlockRunner
Date:	Fri, 18 May 2018 19:40:39 +0200
To:	bug-DBIx-Class [...] rt.cpan.org
From:	Peter Rabbitson <ribasushi [...] leporine.io>

On 05/18/2018 06:49 PM, Brendan Byrd via RT wrote: Show quoted text

> Queue: DBIx-Class > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125344 > > > On Thu May 17 21:41:00 2018, RIBASUSHI wrote:

>> Not at present. There are several grave errors in its design, mainly >> around the surface API interfacing with ::Storage and the handling of >> stack-independent txn frames ( txn_scope_guard ). This is the reason it >> remains hidden ( as the very first line of the module says ).

> > Can you elaborate? The handling of transactions seems pretty reasonable to me:

> > * If starting a transaction via wrap_txn, then handle rollbacks and

never trust the connection on error. Show quoted text

> * If already in a transaction or block, run far far away because BlockRunner can't possibly know what transactionalized statements are before it that might be missed on disconnection rollback. (Also, the outer-layer BlockRunner with wrap_txn=1 will handle the retry logic, anyway.)

Regarding the second portion I was referring to multiple corner cases of multiple nested txn_do blocks interspersed with multiple txn_scope_guard instantiations, plus the general ( not yet fully addressed ) problem of "trapped exception" within a txn . Fixes for both of these will likely require changes to the internals of BR, and this space is already critical enough that trying to do things incrementally is just too risky. In other words I am not amenable to publicize this sub-API until all currently known issues with the transaction handling have been addressed. Yes - in its current implementation it works flawlessly when things "go well", but this isn't enough. With that said - if you are willing to spend the considerable time necessary to "get it perfect" - I will try my best to find the time to gather whatever test cases I have across various notes, to give you a full starting point. Note that I am not trying to put you off from tackling this, I am just being honest with what is required. Show quoted text

>> With that said - I am definitely going to open this module at some point >> in the future, so having ready-to-go documentation parked in a branch >> definitely wouldn't hurt. I just want to clear up-front that the merging >> part will happen "when it is ready".

> > Part of the reason for asking is because I'm finishing up a DBIx::Connector::Retry module for the non-DBIC side (pending review and paperwork), and already have a chunk of documentation that would fit here. > > Another more selfish reason is that I'm also adding retry protection into DBIx::BatchChunker and another new module, so it would be nice to reference a public interface for that, instead of secretly using BlockRunner for increased retries. Yes, I take full responsibility for the risks of involved in abusing private interfaces, but it seemed better than trying to roll our own that looks exactly like BlockRunner. >

Sure thing - though I must disagree with "risk abusing private interfaces" part. You could simply start using BR today: as long as you are willing to write a test on your end and ideally have it somewhere on CPAN, so that I can catch regressions when doing revdeps testing. That is - if the sub-API works for you well *as-is*, then there is little risk for you using it, as any future changes will be carefully planned and announced. Once I do open and document it - changes that are not strictly additions are simply no longer possible. Given how central this interface is: I am being extra cautions and conservative. Show quoted text

>>> What about creating and publicizing attributes within Storage and >>> Storage::DBI to use the same BlockRunner objects created in dbh_do / >>> txn_do ?

>> >> Could you elaborate on this point? I am not entirely sure what you >> meant, just a hunch. Pseudocode would be best!

> > I was going to go the route of a 'block_runner' attribute. > > But, perhaps a more incremental approach would just be 'block_runner_args': > > > Then anybody could specify something like this: > > $storage->block_runner_args({ > retry_handler => sub { ...whatever you want... }, > max_attempts => 10, > }); >

This is just a gut feeling at this point, but both of the above approaches seem problematic in terms of "sharing state" perspective. I can not put my finger on it just yet, and it has been a while since I've been in this headspace. But I did try to do that when I designed BR originally and I had to back out of that design due to...<???> I would need to look through my reflogs to find out the exact details. What is likely to be more "correct" instead is something like: txn_do( \%optional_params, sub { ... }, @optional_sub_args ) However! This needs further design/planning from the other perspective of allowing for finalizers with proper composition ( "do X if and only if we managed to commit, and do Y i and only if we rolled back" ). Apologies for the handwavey response, this is all I have off the top of my head without digging into the codebase. Show quoted text

>> Thank you for filing the issue in any case - gives me more info on how >> to prioritize this.

> > It's our itch to scratch, so I'd be happy to help, but I just wanted to make sure we could hash out the implementation details. >

Thank you for doing that - hashing this out is like 95% of the problem. One last thing to address is that if you are on a tight timeline - you should *probably* go with just using BR as is. I am realistically another month away from attaining a large enough chunk of time to sit down and fix the ZR-reported performance regressions with what is currently in master and another couple months of ironing out random most-convoluted-context regressions. Again - I am not trying to dissuade you from participating in fixing this - it needs to be done and it will be done. I am simply managing walltime expectations up-front.

Fri May 18 19:01:45 2018 BBYRD [...] cpan.org - Correspondence added

On Fri May 18 13:41:04 2018, RIBASUSHI wrote: Show quoted text

> Regarding the second portion I was referring to multiple corner cases of > multiple nested txn_do blocks interspersed with multiple txn_scope_guard > instantiations

BR is already doing quite a bit in terms of transaction_depth checks, so I'm surprised there's even edge cases where this wouldn't work. Show quoted text

> plus the general ( not yet fully addressed ) problem of > "trapped exception" within a txn.

I figured that would bubble up to the outer-most BR layer, since the internal txn loops would immediately short-circuit to running the $cref, uneval'd and outside of the _run recursion. Show quoted text

> With that said - if you are willing to spend the considerable time > necessary to "get it perfect" - I will try my best to find the time to > gather whatever test cases I have across various notes, to give you a > full starting point.

While I can't commit to that level of effort, as the tuits aren't my own, I would be interesting in seeing these kind of test cases if you have them on hand. Maybe they aren't so bad to solve for. And if they exist in BR, they probably would exist in DBIx::Connector::Retry, since they share design concepts. Show quoted text

> Sure thing - though I must disagree with "risk abusing private > interfaces" part. You could simply start using BR today: as long as you > are willing to write a test on your end and ideally have it somewhere on > CPAN, so that I can catch regressions when doing revdeps testing. > > That is - if the sub-API works for you well *as-is*, then there is > little risk for you using it, as any future changes will be carefully > planned and announced.

Fair enough, and thanks for being amicable. Show quoted text

> Once I do open and document it - changes that are not strictly additions > are simply no longer possible. Given how central this interface is: I am > being extra cautions and conservative.

Understood, though it seems like you're kind of in that state already. Any sort of major changes to the interface could be disruptive, despite whatever flaws it may already come with. Even if you go into the direction of subtracting or changing things, it would probably come with compatibility layers to support the old interface. Show quoted text

> This is just a gut feeling at this point, but both of the above > approaches seem problematic in terms of "sharing state" perspective. I > can not put my finger on it just yet, and it has been a while since I've > been in this headspace. But I did try to do that when I designed BR > originally and I had to back out of that design due to...<???> I would > need to look through my reflogs to find out the exact details.

Possibly because of a permanent retry_handler coderef? Or are we talking about a full BR object in a block_runner attribute? Show quoted text

> What is likely to be more "correct" instead is something like: > > txn_do( \%optional_params, sub { ... }, @optional_sub_args )

That does leave the open the possibility of temporarily increasing retries or changing handlers just for one method call. Although, I would expect the need for a more permanent bit of settings, too. Show quoted text

> However! This needs further design/planning from the other perspective > of allowing for finalizers with proper composition ( "do X if and only > if we managed to commit, and do Y i and only if we rolled back" ).

Now that sounds like feature scope creep :) But, if %optional_params is a hashref, then it's infinitely expandable to whatever extra params are needed. Although, if BR manages to commit, which is really the default goal, they can just run the next operation post-txn_do. If BR rolls back, the connection is probably not reliable, but a finalizer still could be called in the rollback eval. Show quoted text

> One last thing to address is that if you are on a tight timeline - you > should *probably* go with just using BR as is. I am realistically > another month away from attaining a large enough chunk of time to sit > down and fix the ZR-reported performance regressions with what is > currently in master and another couple months of ironing out random > most-convoluted-context regressions. Again - I am not trying to dissuade > you from participating in fixing this - it needs to be done and it will > be done. I am simply managing walltime expectations up-front.

Okay, I think I'll go that route for now, but I'd still be open to improving BR in the future. I agree that the problem solving is probably 95% implementation discussion and 5% code.