On Thu Apr 19 08:52:55 2012, LAJANDY wrote:
Show quoted text> Working for a client developing an app using Helios 2.40 with Oracle RAC
> as the backing database (OS: RHEL5). Client's DB was experiencing
> serious connection problems. Noticed 2 processes on a host were
> launching worker processes. A 'ps afx|grep helios' revealed a child
> process of the parent server daemon was launching its own worker
> processes. After some investigation to confirm I was seeing what I
> thought I was seeing, I killed the processes and restarted the Helios
> daemon.
>
> Client eventually fixed their database problems and this issue has not
> recurred. I have not seen this happen in previous versions of Helios,
> nor have I (yet) seen this occur with MySQL.
Discounting the possibility of an MySQL vs Oracle problem, this problem
is most likely due to the extra exception catching done by default in
2.40. In previous Helios versions, a corrupted database connection
would most likely go uncaught, and cause the worker processes to die
naturally. Now that Helios::Service->work() attempts to catch uncaught
errors on behalf of the application, the database connection error
caused worker process execution to jump back to the helios.pl main loop,
a place to which worker processes are never supposed to return. (This
may have happened in the underlying TheSchwartz layer, rather than the
Helios layer.)
The latest dev 2.40_* releases have a fix this bug, as well as an
official Oracle schema. The database exception catching code first
checks to see if the current process is a worker (it checks the
getppid() > 1 as well as double-checks with the new $WORKER_PROCESS
flag) and forces a process exit if the process is a worker.