Skip Menu |

This queue is for tickets about the DBD-Pg CPAN distribution.

Report information
The Basics
Id: 100648
Status: resolved
Priority: 0/
Queue: DBD-Pg

People
Owner: greg [...] turnstep.com
Requestors: TIMB [...] cpan.org
Cc: david [...] justatheory.com
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: 3.5.0



CC: david [...] justatheory.com
Subject: Behaviour of DBD::Pg ping() doesn't match DBI spec. It doesn't check the "database server is still running" in some cases.
The DBD::Pg docs define the ping() method as "check the validity of a database handle" and it returns true in some cases *without communicating with the db server*. The DBI docs define the ping method as checking that "... the database server is still running and the connection to it is still working". I've been debugging a problem with code that's using DBIx::Connector in 'fixup' mode where DBIx::Connector didn't recover after a database crash. As fas a libpq was concerned the connection was within a transaction *so it returned true* without checking that "the database server is still running and the connection to it is still working". This meant a critical service went down and stayed down after the db restarted.
Essentially, ping should never return true without actually communicating with the postgres server. That also means correctly detecting when a pgbouncer is up but the pg instance behind it is down.
A potential workaround would be to add a ping() override method into DBIx::Connector::Driver::Pg
Looks like we issue a SELECT only when we are active but not when idle in a transaction. I think the correct answer is to issue our SELECT in both cases (PQTRANS_IDLE and PQTRANS_INTRANS). I've applied that in: 3e2979fe4428fa9603206455dbfb90917f1a80d3
For the record, from IRC: [11:14am] timbunce_: G_SabinoMullane: why even consider the value of pg_db_txn_status at all? The semantics of ping are unrelated to transactions. [11:14am] timbunce_: (but maybe I'm missing something as I'm not familiar with libpq) [11:42am] G_SabinoMullane: timbunce_: Well we know we can't do anything if it returns active (query already running) or in a failed transaction. [11:43am] G_SabinoMullane: But while we can't run a ping in those circumstances, we can return a little more information (well, pg_ping does: while ping is boolean, pg_ping provides more information via integer return code) [3:42pm] timbunce_: G_SabinoMullane: it's vital that ping returns false on failure of the server/connection regardless of the state of the txn/client. [3:42pm] timbunce_: It sounds like you're saying that in certain txn/client states the state of server/connection can't be determined. Is that right? If so, please give more details. [3:45pm] G_SabinoMullane: If we are, for example, in a failed transaction, we cannot send the 'SELECT 1' to the backend. [3:45pm] G_SabinoMullane: However.. [3:45pm] G_SabinoMullane: I suppose we could still send it and swallow the error, in the theory that the ping server-still-there response trumps everything else. [3:46pm] timbunce_: G_SabinoMullane: yes, that would be fine. [3:47pm] G_SabinoMullane: I suppose the same could be said for the other states too. We can always *send* the SELECT 1, but we cannot promise we will get back a valid answer. [3:47pm] G_SabinoMullane: But in this context, a valid answer could be an ERROR. As long as it came from the server, ping can reply true. [3:48pm] timbunce_: Yes, exactly. Thanks. [3:48pm] G_SabinoMullane: Okay. I will whip up some tests when I get some tuits. [3:50pm] timbunce_: Thanks. Hopefully there's some way to cartegories errors-from-server vs errors-from-libpq. [4:07pm] timbunce_: pgbouncer is another case that needs considering - it might be up but the pg backend be down.
In case it helps, you can use this code to simulate a lost network connection: my $fd = $dbh->{pg_socket} or die "fd"; open(DBH_PG_FH, "<&=".$fd) or die "open(DBH_PG_FH): $!"; close DBH_PG_FH or die "close(DBH_PG_FH): $!"; A test case for this bug would be to collect to the db, start a transaction, call the above code, and then call ping(). Currently ping() returns true in this situation. It needs to return false.
Applied some patches to fix this, test this, and properly document this. Latest git commit is 903ce34f026edc41b63a8b2dff9b55a25fa47f7f
Looks good. Thanks.
A much improved version here: 95d03c1eb2ba1332935a6a44dcfdf8ed5068c9f5 Much better testing, as well.
Thank you for pushing this along. Version 3.5.0 has been released which should address all of the problems with ping() and pg_ping()