Skip Menu |

This queue is for tickets about the Helios CPAN distribution.

Report information
The Basics
Id: 76697
Status: resolved
Priority: 0/
Queue: Helios

People
Owner: LAJANDY [...] cpan.org
Requestors: LAJANDY [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.40
Fixed in: 2.40_1931



Subject: Worker process can turn into a daemon if database connection becomes unstable
Working for a client developing an app using Helios 2.40 with Oracle RAC as the backing database (OS: RHEL5). Client's DB was experiencing serious connection problems. Noticed 2 processes on a host were launching worker processes. A 'ps afx|grep helios' revealed a child process of the parent server daemon was launching its own worker processes. After some investigation to confirm I was seeing what I thought I was seeing, I killed the processes and restarted the Helios daemon. Client eventually fixed their database problems and this issue has not recurred. I have not seen this happen in previous versions of Helios, nor have I (yet) seen this occur with MySQL.
On Thu Apr 19 08:52:55 2012, LAJANDY wrote: Show quoted text
> Working for a client developing an app using Helios 2.40 with Oracle RAC > as the backing database (OS: RHEL5). Client's DB was experiencing > serious connection problems. Noticed 2 processes on a host were > launching worker processes. A 'ps afx|grep helios' revealed a child > process of the parent server daemon was launching its own worker > processes. After some investigation to confirm I was seeing what I > thought I was seeing, I killed the processes and restarted the Helios > daemon. > > Client eventually fixed their database problems and this issue has not > recurred. I have not seen this happen in previous versions of Helios, > nor have I (yet) seen this occur with MySQL.
Discounting the possibility of an MySQL vs Oracle problem, this problem is most likely due to the extra exception catching done by default in 2.40. In previous Helios versions, a corrupted database connection would most likely go uncaught, and cause the worker processes to die naturally. Now that Helios::Service->work() attempts to catch uncaught errors on behalf of the application, the database connection error caused worker process execution to jump back to the helios.pl main loop, a place to which worker processes are never supposed to return. (This may have happened in the underlying TheSchwartz layer, rather than the Helios layer.) The latest dev 2.40_* releases have a fix this bug, as well as an official Oracle schema. The database exception catching code first checks to see if the current process is a worker (it checks the getppid() > 1 as well as double-checks with the new $WORKER_PROCESS flag) and forces a process exit if the process is a worker.
Helios 2.41 (just released) resolves this bug.