Bug #97269 for LMDB_File: Database/Transaction dependency is unclear

Tue Jul 15 23:16:14 2014 https://launchpad.net/~hyc - Ticket created

Subject:

Database/Transaction dependency is unclear

It looks to me like the lifetime of a DB handle is bounded by the lifetime of a transaction. Which means a typical app that just grabs a DB handle once is performing all of its operations inside a single transaction. Or alternatively, if an app explicitly commits a txn, it must obtain a new DB handle when it obtains a new txn. I may have misunderstood the code, but that is definitely not how it should work. DB handles are meant to be long-lived - once the txn in which a DB handle is created has committed, the DB handle is valid for the lifetime of that environment. (It may be explicitly closed sooner, using mdb_dbi_close, but generally there is no good reason to do so.) Currently as I read it, you return [db,txn] pairs and the put/get ops will fail in _chkalive if the txn isn't currently valid. I see no way to associate a new txn to an existing DBI. Also, it appears that you support only a single txn at a time, while LMDB itself supports 1 write txn and arbtirarily many read txns concurrently. Is that right?

Wed Jul 16 17:52:51 2014 sog [...] msg.com.mx - Correspondence added

El Mar Jul 15 23:16:14 2014, https://launchpad.net/~hyc escribió: Show quoted text

> It looks to me like the lifetime of a DB handle is bounded by the > lifetime of a transaction. Which means a typical app that just grabs a > DB handle once is performing all of its operations inside a single > transaction. Or alternatively, if an app explicitly commits a txn, it > must obtain a new DB handle when it obtains a new txn. I may have > misunderstood the code, but that is definitely not how it should work.

You understood the code well, maybe I misuderstood something. lmdb.h states that "All database operations require a transaction handle." and in mdb_dbi_open * The database handle will be private to the current transaction until * the transaction is successfully committed. If the transaction is * aborted the handle will be closed automatically. * After a successful commit the * handle will reside in the shared environment, and may be used * by other transactions. So I assumed that a DB handle without a transaction is useless and if needed inside another transaction is safer (and cheaper) to explicitly open it. Show quoted text

> DB handles are meant to be long-lived - once the txn in which a DB > handle is created has committed, the DB handle is valid for the > lifetime of that environment. (It may be explicitly closed sooner, > using mdb_dbi_close, but generally there is no good reason to do so.)

When a transaction terminates the Perl level "LMDB_File" object is invalidated, but the low level DBI remains alive in the environment. (Unless implicitly closed if the transaction was aborted) Show quoted text

> Currently as I read it, you return [db,txn] pairs and the put/get ops > will fail in _chkalive if the txn isn't currently valid. I see no way > to associate a new txn to an existing DBI.

At the Perl level I don't want to expose a "naked" DBI, so to associate it with a new transaction, the user must "open" it again, a fast operation because, if already opened, mdb_dbi_open only (re)validates it. Show quoted text

> Also, it appears that you support only a single txn at a time, while > LMDB itself supports 1 write txn and arbtirarily many read txns > concurrently. Is that right?

Yes, right now and per thread, the high level Perl API support only one transaction at a time. I don't feel the need for more than one read transaction. Can you show me an user case? Regards.

Wed Jul 16 17:52:51 2014 The RT System itself - Status changed from 'new' to 'open'

Wed Aug 13 13:56:47 2014 FRACTAL [...] cpan.org - Correspondence added

On Wed Jul 16 17:52:51 2014, SORTIZ wrote: Show quoted text

> El Mar Jul 15 23:16:14 2014, https://launchpad.net/~hyc escribió:

> > DB handles are meant to be long-lived - once the txn in which a DB > > handle is created has committed, the DB handle is valid for the > > lifetime of that environment. (It may be explicitly closed sooner, > > using mdb_dbi_close, but generally there is no good reason to do so.)

> > When a transaction terminates the Perl level "LMDB_File" object is > invalidated, but the low level DBI remains alive in the environment. > (Unless implicitly closed if the transaction was aborted) >

> > Currently as I read it, you return [db,txn] pairs and the put/get ops > > will fail in _chkalive if the txn isn't currently valid. I see no way > > to associate a new txn to an existing DBI.

> > At the Perl level I don't want to expose a "naked" DBI, so to > associate > it with a new transaction, the user must "open" it again, a fast > operation > because, if already opened, mdb_dbi_open only (re)validates it.

I think Salvador's approach is reasonable. What is the advantage of re-using the DB in multiple transactions? Is constantly creating/destroying DBs for every transaction costly? (How about costly relative to a perl method call? :) The perl interface feels very convenient to me, not requiring me to persist DB handles that I might need in subsequent transactions. Show quoted text

> > Also, it appears that you support only a single txn at a time, while > > LMDB itself supports 1 write txn and arbtirarily many read txns > > concurrently. Is that right?

> > Yes, right now and per thread, the high level Perl API support only > one > transaction at a time. I don't feel the need for more than one read > transaction. > Can you show me an user case?

Salvador: There does seem to be some strangeness with having multiple transactions outstanding in the current env. If this isn't supported (as probably makes sense in perl since there isn't really any threading) could we maybe add some error detection and/or some notes in the docs? Here is a simple program that trips an assertion when using two transactions simultaneously: use strict; use LMDB_File qw(:flags :cursor_op); mkdir("junkdb"); my $env = LMDB::Env->new("junkdb", { mapsize => 100 * 1024 * 1024 * 1024, }); my $txn = $env->BeginTxn; my $db = $txn->OpenDB({ dbname => "rofl", flags => MDB_CREATE }); my $txn2 = $env->BeginTxn; my $db2 = $txn2->OpenDB({ dbname => "rofl", flags => MDB_CREATE }); $db->put("hello", "world"); $txn->commit; $db2->get("hello"); And running it: $ perl tp.pl perl: mdb.c:2916: mdb_txn_commit: Assertion `i == x' failed. Aborted (core dumped) PS: Previously I tested the nested transaction support and that *does* seem to work as expected.

Thu Sep 11 10:37:02 2014 sog [...] msg.com.mx - Correspondence added

El Mié Ago 13 13:56:47 2014, FRACTAL escribió: Show quoted text

> ... > Salvador: There does seem to be some strangeness with having multiple > transactions outstanding in the current env. If this isn't supported > (as probably makes sense in perl since there isn't really any > threading) could we maybe add some error detection and/or some notes > in the docs? > ...

This particular issue, which should have had its own ticket, was addressed. Version 0.06 will be enforcing the rule that "A parent transaction and its cursors may not issue any other operations than commit and abort while it has active child transactions.", avoiding the low level failed assertion. For the main issue, I'll wait for Howard's feedback.

Thu Sep 11 10:37:03 2014 sog [...] msg.com.mx - Status changed from 'open' to 'stalled'

Thu Sep 11 10:37:03 2014 sog [...] msg.com.mx - Taken

Thu Sep 11 10:59:23 2014 https://launchpad.net/~hyc - Correspondence added

On Wed Aug 13 13:56:47 2014, FRACTAL wrote: Show quoted text

> On Wed Jul 16 17:52:51 2014, SORTIZ wrote:

> > At the Perl level I don't want to expose a "naked" DBI, so to > > associate > > it with a new transaction, the user must "open" it again, a fast > > operation > > because, if already opened, mdb_dbi_open only (re)validates it.

> > I think Salvador's approach is reasonable. What is the advantage of > re-using the DB in multiple transactions? Is constantly > creating/destroying DBs for every transaction costly? (How about > costly relative to a perl method call? :) > > The perl interface feels very convenient to me, not requiring me to > persist DB handles that I might need in subsequent transactions.

A DBI handle is simply an integer, I don't believe it's especially cumbersome to maintain it. The cost of opening it on every transaction is a linear search in LMDB's internal array of open handles, followed by a Btree search if it wasn't already open (the handle wasn't found). I suppose, compared to a perl method call, all of this is cheap. But it's still unnecessary when you could simply remember the integer and use it directly on subsequent operations. If you have many transactions to perform, or many DBs in use simultaneously, the costs add up. Ultimately you can do whatever is most convenient for you. Just understand that the API was designed for mdb_dbi_open to be called very infrequently - once at application startup - which is why it doesn't do anything more efficient than a linear search. If you have hundreds of DBs open simultaneously, it may get ugly.

Thu Sep 11 10:59:24 2014 The RT System itself - Status changed from 'stalled' to 'open'

Fri Sep 12 01:16:03 2014 sog [...] msg.com.mx - Correspondence added

El Jue Sep 11 10:59:23 2014, https://launchpad.net/~hyc escribió: Show quoted text

> On Wed Aug 13 13:56:47 2014, FRACTAL wrote:

> > On Wed Jul 16 17:52:51 2014, SORTIZ wrote:

> > > At the Perl level I don't want to expose a "naked" DBI, so to > > > associate > > > it with a new transaction, the user must "open" it again, a fast > > > operation > > > because, if already opened, mdb_dbi_open only (re)validates it.

> > > > I think Salvador's approach is reasonable. What is the advantage of > > re-using the DB in multiple transactions? Is constantly > > creating/destroying DBs for every transaction costly? (How about > > costly relative to a perl method call? :) > > > > The perl interface feels very convenient to me, not requiring me to > > persist DB handles that I might need in subsequent transactions.

> > A DBI handle is simply an integer, I don't believe it's especially > cumbersome to maintain it. > > The cost of opening it on every transaction is a linear search in > LMDB's internal array of open handles, followed by a Btree search if > it wasn't already open (the handle wasn't found). I suppose, compared > to a perl method call, all of this is cheap. But it's still > unnecessary when you could simply remember the integer and use it > directly on subsequent operations. If you have many transactions to > perform, or many DBs in use simultaneously, the costs add up. > > Ultimately you can do whatever is most convenient for you. Just > understand that the API was designed for mdb_dbi_open to be called > very infrequently - once at application startup - which is why it > doesn't do anything more efficient than a linear search. If you have > hundreds of DBs open simultaneously, it may get ugly.

Yes, I understand well. For the next release I included a few new methods that expose and allow the reuse of bare DBI handlers when the need arises.