Bug #76697 for Helios: Worker process can turn into a daemon if database connection becomes unstable

Thu Apr 19 08:52:55 2012 LAJANDY [...] cpan.org - Ticket created

Subject:

Worker process can turn into a daemon if database connection becomes unstable

Working for a client developing an app using Helios 2.40 with Oracle RAC as the backing database (OS: RHEL5). Client's DB was experiencing serious connection problems. Noticed 2 processes on a host were launching worker processes. A 'ps afx|grep helios' revealed a child process of the parent server daemon was launching its own worker processes. After some investigation to confirm I was seeing what I thought I was seeing, I killed the processes and restarted the Helios daemon. Client eventually fixed their database problems and this issue has not recurred. I have not seen this happen in previous versions of Helios, nor have I (yet) seen this occur with MySQL.

Thu May 10 08:44:42 2012 LAJANDY [...] cpan.org - Taken

Thu May 10 09:02:40 2012 LAJANDY [...] cpan.org - Correspondence added

On Thu Apr 19 08:52:55 2012, LAJANDY wrote: Show quoted text

> Working for a client developing an app using Helios 2.40 with Oracle RAC > as the backing database (OS: RHEL5). Client's DB was experiencing > serious connection problems. Noticed 2 processes on a host were > launching worker processes. A 'ps afx|grep helios' revealed a child > process of the parent server daemon was launching its own worker > processes. After some investigation to confirm I was seeing what I > thought I was seeing, I killed the processes and restarted the Helios > daemon. > > Client eventually fixed their database problems and this issue has not > recurred. I have not seen this happen in previous versions of Helios, > nor have I (yet) seen this occur with MySQL.

Discounting the possibility of an MySQL vs Oracle problem, this problem is most likely due to the extra exception catching done by default in 2.40. In previous Helios versions, a corrupted database connection would most likely go uncaught, and cause the worker processes to die naturally. Now that Helios::Service->work() attempts to catch uncaught errors on behalf of the application, the database connection error caused worker process execution to jump back to the helios.pl main loop, a place to which worker processes are never supposed to return. (This may have happened in the underlying TheSchwartz layer, rather than the Helios layer.) The latest dev 2.40_* releases have a fix this bug, as well as an official Oracle schema. The database exception catching code first checks to see if the current process is a worker (it checks the getppid() > 1 as well as double-checks with the new $WORKER_PROCESS flag) and forces a process exit if the process is a worker.

Thu May 10 09:02:41 2012 LAJANDY [...] cpan.org - Status changed from 'new' to 'open'

Thu May 10 09:04:32 2012 LAJANDY [...] cpan.org - Fixed in 2.40_1931 added

Mon Jun 04 10:47:23 2012 LAJANDY [...] cpan.org - Correspondence added

Helios 2.41 (just released) resolves this bug.

Mon Jun 04 10:47:23 2012 LAJANDY [...] cpan.org - Status changed from 'open' to 'resolved'