Skip Menu |

This queue is for tickets about the Helios CPAN distribution.

Report information
The Basics
Id: 79690
Status: resolved
Priority: 0/
Queue: Helios

People
Owner: LAJANDY [...] cpan.org
Requestors: LAJANDY [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 2.41
  • 2.60
Fixed in:
  • 2.601_3610
  • 2.601_3670
  • 2.601_3750
  • 2.61
  • 2.71_3860
  • 2.71_4051
  • 2.71_4250
  • 2.71_4350
  • 2.71_4460
  • 2.71_4770
  • 2.72_0950
  • 2.80



Sometimes when a worker process picks up a job, it fails with: "Can't use string ("") as an ARRAY ref while "strict refs" in use at /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” in the ERROR table. No success or failure messages are reported in job history. The job is picked up later by another process and completes successfully.
Subject: "Can't use string ("") as an ARRAY ref while "strict refs" in use" error
On Mon Sep 17 09:13:21 2012, LAJANDY wrote: Show quoted text
> Sometimes when a worker process picks up a job, it fails with: > > "Can't use string ("") as an ARRAY ref while "strict refs" in use at > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > in the ERROR table. No success or failure messages are reported in job > history. > > The job is picked up later by another process and completes successfully.
On Mon Sep 17 09:13:21 2012, LAJANDY wrote: Show quoted text
> Sometimes when a worker process picks up a job, it fails with: > > "Can't use string ("") as an ARRAY ref while "strict refs" in use at > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > in the ERROR table. No success or failure messages are reported in job > history. > > The job is picked up later by another process and completes successfully.
A GitHub branch has been created for this bug: https://github.com/logicalhelion/helios/tree/bug/rt79690
A potential patch for this bug has been committed to GitHub: https://github.com/logicalhelion/helios/commit/25654bbf106be0d91b4447c6e246fc16fe0026f1 If it passes testing, it will be rolled into a forthcoming bugfix release. It should be noted, however, that this does not actually fix the problem--it just handles the problem in a way that does not cause non-retrying jobs to disappear from the job queue. This bug is actually being caused by TheSchwartz for some reason; TheSchwartz is passing Helios::Service a TheSchwartz::Job object with an empty string for arg(), even though the job in question does indeed have job arguments. This causes Helios::Job->new() to bomb when trying to start job argument processing--it expects arg() to return an arrayref, NOT a string. Changing Helios::Job to handle the empty string is the wrong idea--the job actually has arguments, Helios just didn't get them (thus, the copy of the job Helios was given is corrupted). Trying to run a job while not having its arguments would be worse than not running it at all. This patch catches the error, logs a Critical error to the Helios log, and the exits the worker process. That way, TheSchwartz will not force a failure of the job (which it will do if a worker doesn't mark a job as successful or failed) and the job will stay in the job queue until its grabbed_until expires and another worker process picks it up. Further future investigation will hopefully reveal the core reason for this bug, but this patch at least ensures job integrity and system reliability. On Fri Aug 09 17:13:20 2013, LAJANDY wrote: Show quoted text
> On Mon Sep 17 09:13:21 2012, LAJANDY wrote:
> > Sometimes when a worker process picks up a job, it fails with: > > > > "Can't use string ("") as an ARRAY ref while "strict refs" in use at > > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > > > in the ERROR table. No success or failure messages are reported in job > > history. > > > > The job is picked up later by another process and completes successfully.
> > A GitHub branch has been created for this bug: > https://github.com/logicalhelion/helios/tree/bug/rt79690 >
The main problem with this bug is it can cause jobs submitted to Helios to be lost without being run. With Helios services that do not retry failed jobs (using MaxRetries() and RetryInterval()), when this bug occurs the job will effectively disappear from the job queue without being passed to the service's run() and without any job history being recorded. (BAD!) For services that retry failed jobs, it just means one of the retries will be delayed for grab_for() seconds (default: 3600). The patch included in the 2.601* series prevents the "lost job" problem by shutting down the worker process before the corrupt TheSchwartz::Job is inflated to a Helios::Job. Thus, no jobs will be lost, period. The grab_for() delay will still happen, but there will be NO lost jobs. An actual fix requires a better explanation: Apparently there is no problem with Helios, TheSchwartz, or even Data::ObjectDriver. The problem appears to be either with the DBD:: modules in question. At certain times some database queries appear to lose their LOB bindings, which causes LOB fields in the result set to be returned blank. Many of these LOB-handling bugs have been fixed in the past with DBD::mysql and DBD::Oracle, but looking at the DBD::Oracle RT will reveal that several of these are still outstanding. Given the client I worked with on this bug has a older DBD::Oracle that pre-dates some of the LOB handling fixes, and the small occurrence of these issues (0.1-0.4% of jobs), we believe this bug is actually a result of LOB handling bugs in the DBD modules in question. We will try to implement a deeper fix in Helios 2.8 by checking a job object in the TheSchwartz layer before it is passed into the Helios layers. If a job object is received from the database with no args, it can be discarded and another one selected. But given that any jobs could be lost, even such a small number, we did not want to wait until Helios 2.8 is ready to implement *some* sort of fix. So for now, if you are experiencing this bug, update to the latest Helios (2.601_3750 for now, 2.61 will be out soon) and update your DBD module to the latest release. On Sun Aug 11 17:48:19 2013, LAJANDY wrote: Show quoted text
> A potential patch for this bug has been committed to GitHub: > > https://github.com/logicalhelion/helios/commit/25654bbf106be0d91b4447c6e246fc16fe0026f1 > > If it passes testing, it will be rolled into a forthcoming bugfix > release. > > It should be noted, however, that this does not actually fix the > problem--it just handles the problem in a way that does not cause non- > retrying jobs to disappear from the job queue. This bug is actually > being caused by TheSchwartz for some reason; TheSchwartz is passing > Helios::Service a TheSchwartz::Job object with an empty string for > arg(), even though the job in question does indeed have job arguments. > This causes Helios::Job->new() to bomb when trying to start job > argument processing--it expects arg() to return an arrayref, NOT a > string. Changing Helios::Job to handle the empty string is the wrong > idea--the job actually has arguments, Helios just didn't get them > (thus, the copy of the job Helios was given is corrupted). Trying to > run a job while not having its arguments would be worse than not > running it at all. This patch catches the error, logs a Critical > error to the Helios log, and the exits the worker process. That way, > TheSchwartz will not force a failure of the job (which it will do if a > worker doesn't mark a job as successful or failed) and the job will > stay in the job queue until its grabbed_until expires and another > worker process picks it up. > > Further future investigation will hopefully reveal the core reason for > this bug, but this patch at least ensures job integrity and system > reliability. > > On Fri Aug 09 17:13:20 2013, LAJANDY wrote:
> > On Mon Sep 17 09:13:21 2012, LAJANDY wrote:
> > > Sometimes when a worker process picks up a job, it fails with: > > > > > > "Can't use string ("") as an ARRAY ref while "strict refs" in use > > > at > > > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > > > > > in the ERROR table. No success or failure messages are reported in > > > job > > > history. > > > > > > The job is picked up later by another process and completes > > > successfully.
> > > > A GitHub branch has been created for this bug: > > https://github.com/logicalhelion/helios/tree/bug/rt79690 > >
On Fri Sep 13 17:17:39 2013, LAJANDY wrote: Show quoted text
> The main problem with this bug is it can cause jobs submitted to > Helios to be lost without being run. With Helios services that do not > retry failed jobs (using MaxRetries() and RetryInterval()), when this > bug occurs the job will effectively disappear from the job queue > without being passed to the service's run() and without any job > history being recorded. (BAD!) > > For services that retry failed jobs, it just means one of the retries > will be delayed for grab_for() seconds (default: 3600). > > The patch included in the 2.601* series prevents the "lost job" > problem by shutting down the worker process before the corrupt > TheSchwartz::Job is inflated to a Helios::Job. Thus, no jobs will be > lost, period. The grab_for() delay will still happen, but there will > be NO lost jobs. > > An actual fix requires a better explanation: > > Apparently there is no problem with Helios, TheSchwartz, or even > Data::ObjectDriver. The problem appears to be either with the DBD:: > modules in question. At certain times some database queries appear to > lose their LOB bindings, which causes LOB fields in the result set to > be returned blank. Many of these LOB-handling bugs have been fixed in > the past with DBD::mysql and DBD::Oracle, but looking at the > DBD::Oracle RT will reveal that several of these are still > outstanding. Given the client I worked with on this bug has a older > DBD::Oracle that pre-dates some of the LOB handling fixes, and the > small occurrence of these issues (0.1-0.4% of jobs), we believe this > bug is actually a result of LOB handling bugs in the DBD modules in > question. > > We will try to implement a deeper fix in Helios 2.8 by checking a job > object in the TheSchwartz layer before it is passed into the Helios > layers. If a job object is received from the database with no args, > it can be discarded and another one selected. But given that any jobs > could be lost, even such a small number, we did not want to wait until > Helios 2.8 is ready to implement *some* sort of fix. > > So for now, if you are experiencing this bug, update to the latest > Helios (2.601_3750 for now, 2.61 will be out soon) and update your DBD > module to the latest release. > > On Sun Aug 11 17:48:19 2013, LAJANDY wrote:
> > A potential patch for this bug has been committed to GitHub: > > > > https://github.com/logicalhelion/helios/commit/25654bbf106be0d91b4447c6e246fc16fe0026f1 > > > > If it passes testing, it will be rolled into a forthcoming bugfix > > release. > > > > It should be noted, however, that this does not actually fix the > > problem--it just handles the problem in a way that does not cause > > non- > > retrying jobs to disappear from the job queue. This bug is actually > > being caused by TheSchwartz for some reason; TheSchwartz is passing > > Helios::Service a TheSchwartz::Job object with an empty string for > > arg(), even though the job in question does indeed have job > > arguments. > > This causes Helios::Job->new() to bomb when trying to start job > > argument processing--it expects arg() to return an arrayref, NOT a > > string. Changing Helios::Job to handle the empty string is the wrong > > idea--the job actually has arguments, Helios just didn't get them > > (thus, the copy of the job Helios was given is corrupted). Trying to > > run a job while not having its arguments would be worse than not > > running it at all. This patch catches the error, logs a Critical > > error to the Helios log, and the exits the worker process. That way, > > TheSchwartz will not force a failure of the job (which it will do if > > a > > worker doesn't mark a job as successful or failed) and the job will > > stay in the job queue until its grabbed_until expires and another > > worker process picks it up. > > > > Further future investigation will hopefully reveal the core reason > > for > > this bug, but this patch at least ensures job integrity and system > > reliability. > > > > On Fri Aug 09 17:13:20 2013, LAJANDY wrote:
> > > On Mon Sep 17 09:13:21 2012, LAJANDY wrote:
> > > > Sometimes when a worker process picks up a job, it fails with: > > > > > > > > "Can't use string ("") as an ARRAY ref while "strict refs" in use > > > > at > > > > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > > > > > > > in the ERROR table. No success or failure messages are reported > > > > in > > > > job > > > > history. > > > > > > > > The job is picked up later by another process and completes > > > > successfully.
> > > > > > A GitHub branch has been created for this bug: > > > https://github.com/logicalhelion/helios/tree/bug/rt79690 > > >
The workaround for this bug included with Helios::TS in 2.71_4051 appears to fix this problem. It will be included in the forthcoming Helios 2.80 production release. Helios::TS->_grab_a_job() checks a job object before it is "grabbed" (the job is locked in the queue for processing) and makes sure the arg() method is reporting a reference to a data structure. If arg() does not return a reference, _grab_a_job() skips that job and pulls the next one from the array of jobs retrieved from the job queue. This way, any LOB-binding problem resulting in a blank arg() value will be avoided. On Sat Sep 14 19:00:29 2013, LAJANDY wrote: Show quoted text
> On Fri Sep 13 17:17:39 2013, LAJANDY wrote:
> > The main problem with this bug is it can cause jobs submitted to > > Helios to be lost without being run. With Helios services that do > > not > > retry failed jobs (using MaxRetries() and RetryInterval()), when this > > bug occurs the job will effectively disappear from the job queue > > without being passed to the service's run() and without any job > > history being recorded. (BAD!) > > > > For services that retry failed jobs, it just means one of the retries > > will be delayed for grab_for() seconds (default: 3600). > > > > The patch included in the 2.601* series prevents the "lost job" > > problem by shutting down the worker process before the corrupt > > TheSchwartz::Job is inflated to a Helios::Job. Thus, no jobs will be > > lost, period. The grab_for() delay will still happen, but there will > > be NO lost jobs. > > > > An actual fix requires a better explanation: > > > > Apparently there is no problem with Helios, TheSchwartz, or even > > Data::ObjectDriver. The problem appears to be either with the DBD:: > > modules in question. At certain times some database queries appear > > to > > lose their LOB bindings, which causes LOB fields in the result set to > > be returned blank. Many of these LOB-handling bugs have been fixed > > in > > the past with DBD::mysql and DBD::Oracle, but looking at the > > DBD::Oracle RT will reveal that several of these are still > > outstanding. Given the client I worked with on this bug has a older > > DBD::Oracle that pre-dates some of the LOB handling fixes, and the > > small occurrence of these issues (0.1-0.4% of jobs), we believe this > > bug is actually a result of LOB handling bugs in the DBD modules in > > question. > > > > We will try to implement a deeper fix in Helios 2.8 by checking a job > > object in the TheSchwartz layer before it is passed into the Helios > > layers. If a job object is received from the database with no args, > > it can be discarded and another one selected. But given that any > > jobs > > could be lost, even such a small number, we did not want to wait > > until > > Helios 2.8 is ready to implement *some* sort of fix. > > > > So for now, if you are experiencing this bug, update to the latest > > Helios (2.601_3750 for now, 2.61 will be out soon) and update your > > DBD > > module to the latest release. > > > > On Sun Aug 11 17:48:19 2013, LAJANDY wrote:
> > > A potential patch for this bug has been committed to GitHub: > > > > > > https://github.com/logicalhelion/helios/commit/25654bbf106be0d91b4447c6e246fc16fe0026f1 > > > > > > If it passes testing, it will be rolled into a forthcoming bugfix > > > release. > > > > > > It should be noted, however, that this does not actually fix the > > > problem--it just handles the problem in a way that does not cause > > > non- > > > retrying jobs to disappear from the job queue. This bug is > > > actually > > > being caused by TheSchwartz for some reason; TheSchwartz is passing > > > Helios::Service a TheSchwartz::Job object with an empty string for > > > arg(), even though the job in question does indeed have job > > > arguments. > > > This causes Helios::Job->new() to bomb when trying to start job > > > argument processing--it expects arg() to return an arrayref, NOT a > > > string. Changing Helios::Job to handle the empty string is the > > > wrong > > > idea--the job actually has arguments, Helios just didn't get them > > > (thus, the copy of the job Helios was given is corrupted). Trying > > > to > > > run a job while not having its arguments would be worse than not > > > running it at all. This patch catches the error, logs a Critical > > > error to the Helios log, and the exits the worker process. That > > > way, > > > TheSchwartz will not force a failure of the job (which it will do > > > if > > > a > > > worker doesn't mark a job as successful or failed) and the job will > > > stay in the job queue until its grabbed_until expires and another > > > worker process picks it up. > > > > > > Further future investigation will hopefully reveal the core reason > > > for > > > this bug, but this patch at least ensures job integrity and system > > > reliability. > > > > > > On Fri Aug 09 17:13:20 2013, LAJANDY wrote:
> > > > On Mon Sep 17 09:13:21 2012, LAJANDY wrote:
> > > > > Sometimes when a worker process picks up a job, it fails with: > > > > > > > > > > "Can't use string ("") as an ARRAY ref while "strict refs" in > > > > > use > > > > > at > > > > > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > > > > > > > > > in the ERROR table. No success or failure messages are > > > > > reported > > > > > in > > > > > job > > > > > history. > > > > > > > > > > The job is picked up later by another process and completes > > > > > successfully.
> > > > > > > > A GitHub branch has been created for this bug: > > > > https://github.com/logicalhelion/helios/tree/bug/rt79690 > > > >
On Wed Oct 16 20:08:02 2013, LAJANDY wrote: Show quoted text
> The workaround for this bug included with Helios::TS in 2.71_4051 > appears to fix this problem. It will be included in the forthcoming > Helios 2.80 production release. > > Helios::TS->_grab_a_job() checks a job object before it is "grabbed" > (the job is locked in the queue for processing) and makes sure the > arg() method is reporting a reference to a data structure. If arg() > does not return a reference, _grab_a_job() skips that job and pulls > the next one from the array of jobs retrieved from the job queue. > This way, any LOB-binding problem resulting in a blank arg() value > will be avoided. > > On Sat Sep 14 19:00:29 2013, LAJANDY wrote:
> > On Fri Sep 13 17:17:39 2013, LAJANDY wrote:
> > > The main problem with this bug is it can cause jobs submitted to > > > Helios to be lost without being run. With Helios services that do > > > not > > > retry failed jobs (using MaxRetries() and RetryInterval()), when > > > this > > > bug occurs the job will effectively disappear from the job queue > > > without being passed to the service's run() and without any job > > > history being recorded. (BAD!) > > > > > > For services that retry failed jobs, it just means one of the > > > retries > > > will be delayed for grab_for() seconds (default: 3600). > > > > > > The patch included in the 2.601* series prevents the "lost job" > > > problem by shutting down the worker process before the corrupt > > > TheSchwartz::Job is inflated to a Helios::Job. Thus, no jobs will > > > be > > > lost, period. The grab_for() delay will still happen, but there > > > will > > > be NO lost jobs. > > > > > > An actual fix requires a better explanation: > > > > > > Apparently there is no problem with Helios, TheSchwartz, or even > > > Data::ObjectDriver. The problem appears to be either with the > > > DBD:: > > > modules in question. At certain times some database queries appear > > > to > > > lose their LOB bindings, which causes LOB fields in the result set > > > to > > > be returned blank. Many of these LOB-handling bugs have been fixed > > > in > > > the past with DBD::mysql and DBD::Oracle, but looking at the > > > DBD::Oracle RT will reveal that several of these are still > > > outstanding. Given the client I worked with on this bug has a > > > older > > > DBD::Oracle that pre-dates some of the LOB handling fixes, and the > > > small occurrence of these issues (0.1-0.4% of jobs), we believe > > > this > > > bug is actually a result of LOB handling bugs in the DBD modules in > > > question. > > > > > > We will try to implement a deeper fix in Helios 2.8 by checking a > > > job > > > object in the TheSchwartz layer before it is passed into the Helios > > > layers. If a job object is received from the database with no > > > args, > > > it can be discarded and another one selected. But given that any > > > jobs > > > could be lost, even such a small number, we did not want to wait > > > until > > > Helios 2.8 is ready to implement *some* sort of fix. > > > > > > So for now, if you are experiencing this bug, update to the latest > > > Helios (2.601_3750 for now, 2.61 will be out soon) and update your > > > DBD > > > module to the latest release. > > > > > > On Sun Aug 11 17:48:19 2013, LAJANDY wrote:
> > > > A potential patch for this bug has been committed to GitHub: > > > > > > > > https://github.com/logicalhelion/helios/commit/25654bbf106be0d91b4447c6e246fc16fe0026f1 > > > > > > > > If it passes testing, it will be rolled into a forthcoming bugfix > > > > release. > > > > > > > > It should be noted, however, that this does not actually fix the > > > > problem--it just handles the problem in a way that does not cause > > > > non- > > > > retrying jobs to disappear from the job queue. This bug is > > > > actually > > > > being caused by TheSchwartz for some reason; TheSchwartz is > > > > passing > > > > Helios::Service a TheSchwartz::Job object with an empty string > > > > for > > > > arg(), even though the job in question does indeed have job > > > > arguments. > > > > This causes Helios::Job->new() to bomb when trying to start job > > > > argument processing--it expects arg() to return an arrayref, NOT > > > > a > > > > string. Changing Helios::Job to handle the empty string is the > > > > wrong > > > > idea--the job actually has arguments, Helios just didn't get them > > > > (thus, the copy of the job Helios was given is corrupted). > > > > Trying > > > > to > > > > run a job while not having its arguments would be worse than not > > > > running it at all. This patch catches the error, logs a Critical > > > > error to the Helios log, and the exits the worker process. That > > > > way, > > > > TheSchwartz will not force a failure of the job (which it will do > > > > if > > > > a > > > > worker doesn't mark a job as successful or failed) and the job > > > > will > > > > stay in the job queue until its grabbed_until expires and another > > > > worker process picks it up. > > > > > > > > Further future investigation will hopefully reveal the core > > > > reason > > > > for > > > > this bug, but this patch at least ensures job integrity and > > > > system > > > > reliability. > > > > > > > > On Fri Aug 09 17:13:20 2013, LAJANDY wrote:
> > > > > On Mon Sep 17 09:13:21 2012, LAJANDY wrote:
> > > > > > Sometimes when a worker process picks up a job, it fails > > > > > > with: > > > > > > > > > > > > "Can't use string ("") as an ARRAY ref while "strict refs" in > > > > > > use > > > > > > at > > > > > > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > > > > > > > > > > > in the ERROR table. No success or failure messages are > > > > > > reported > > > > > > in > > > > > > job > > > > > > history. > > > > > > > > > > > > The job is picked up later by another process and completes > > > > > > successfully.
> > > > > > > > > > A GitHub branch has been created for this bug: > > > > > https://github.com/logicalhelion/helios/tree/bug/rt79690 > > > > >
On Sat Mar 15 22:12:55 2014, LAJANDY wrote: Show quoted text
> On Wed Oct 16 20:08:02 2013, LAJANDY wrote:
> > The workaround for this bug included with Helios::TS in 2.71_4051 > > appears to fix this problem. It will be included in the forthcoming > > Helios 2.80 production release. > > > > Helios::TS->_grab_a_job() checks a job object before it is "grabbed" > > (the job is locked in the queue for processing) and makes sure the > > arg() method is reporting a reference to a data structure. If arg() > > does not return a reference, _grab_a_job() skips that job and pulls > > the next one from the array of jobs retrieved from the job queue. > > This way, any LOB-binding problem resulting in a blank arg() value > > will be avoided. > > > > On Sat Sep 14 19:00:29 2013, LAJANDY wrote:
> > > On Fri Sep 13 17:17:39 2013, LAJANDY wrote:
> > > > The main problem with this bug is it can cause jobs submitted to > > > > Helios to be lost without being run. With Helios services that > > > > do > > > > not > > > > retry failed jobs (using MaxRetries() and RetryInterval()), when > > > > this > > > > bug occurs the job will effectively disappear from the job queue > > > > without being passed to the service's run() and without any job > > > > history being recorded. (BAD!) > > > > > > > > For services that retry failed jobs, it just means one of the > > > > retries > > > > will be delayed for grab_for() seconds (default: 3600). > > > > > > > > The patch included in the 2.601* series prevents the "lost job" > > > > problem by shutting down the worker process before the corrupt > > > > TheSchwartz::Job is inflated to a Helios::Job. Thus, no jobs > > > > will > > > > be > > > > lost, period. The grab_for() delay will still happen, but there > > > > will > > > > be NO lost jobs. > > > > > > > > An actual fix requires a better explanation: > > > > > > > > Apparently there is no problem with Helios, TheSchwartz, or even > > > > Data::ObjectDriver. The problem appears to be either with the > > > > DBD:: > > > > modules in question. At certain times some database queries > > > > appear > > > > to > > > > lose their LOB bindings, which causes LOB fields in the result > > > > set > > > > to > > > > be returned blank. Many of these LOB-handling bugs have been > > > > fixed > > > > in > > > > the past with DBD::mysql and DBD::Oracle, but looking at the > > > > DBD::Oracle RT will reveal that several of these are still > > > > outstanding. Given the client I worked with on this bug has a > > > > older > > > > DBD::Oracle that pre-dates some of the LOB handling fixes, and > > > > the > > > > small occurrence of these issues (0.1-0.4% of jobs), we believe > > > > this > > > > bug is actually a result of LOB handling bugs in the DBD modules > > > > in > > > > question. > > > > > > > > We will try to implement a deeper fix in Helios 2.8 by checking a > > > > job > > > > object in the TheSchwartz layer before it is passed into the > > > > Helios > > > > layers. If a job object is received from the database with no > > > > args, > > > > it can be discarded and another one selected. But given that any > > > > jobs > > > > could be lost, even such a small number, we did not want to wait > > > > until > > > > Helios 2.8 is ready to implement *some* sort of fix. > > > > > > > > So for now, if you are experiencing this bug, update to the > > > > latest > > > > Helios (2.601_3750 for now, 2.61 will be out soon) and update > > > > your > > > > DBD > > > > module to the latest release. > > > > > > > > On Sun Aug 11 17:48:19 2013, LAJANDY wrote:
> > > > > A potential patch for this bug has been committed to GitHub: > > > > > > > > > > https://github.com/logicalhelion/helios/commit/25654bbf106be0d91b4447c6e246fc16fe0026f1 > > > > > > > > > > If it passes testing, it will be rolled into a forthcoming > > > > > bugfix > > > > > release. > > > > > > > > > > It should be noted, however, that this does not actually fix > > > > > the > > > > > problem--it just handles the problem in a way that does not > > > > > cause > > > > > non- > > > > > retrying jobs to disappear from the job queue. This bug is > > > > > actually > > > > > being caused by TheSchwartz for some reason; TheSchwartz is > > > > > passing > > > > > Helios::Service a TheSchwartz::Job object with an empty string > > > > > for > > > > > arg(), even though the job in question does indeed have job > > > > > arguments. > > > > > This causes Helios::Job->new() to bomb when trying to start job > > > > > argument processing--it expects arg() to return an arrayref, > > > > > NOT > > > > > a > > > > > string. Changing Helios::Job to handle the empty string is the > > > > > wrong > > > > > idea--the job actually has arguments, Helios just didn't get > > > > > them > > > > > (thus, the copy of the job Helios was given is corrupted). > > > > > Trying > > > > > to > > > > > run a job while not having its arguments would be worse than > > > > > not > > > > > running it at all. This patch catches the error, logs a > > > > > Critical > > > > > error to the Helios log, and the exits the worker process. > > > > > That > > > > > way, > > > > > TheSchwartz will not force a failure of the job (which it will > > > > > do > > > > > if > > > > > a > > > > > worker doesn't mark a job as successful or failed) and the job > > > > > will > > > > > stay in the job queue until its grabbed_until expires and > > > > > another > > > > > worker process picks it up. > > > > > > > > > > Further future investigation will hopefully reveal the core > > > > > reason > > > > > for > > > > > this bug, but this patch at least ensures job integrity and > > > > > system > > > > > reliability. > > > > > > > > > > On Fri Aug 09 17:13:20 2013, LAJANDY wrote:
> > > > > > On Mon Sep 17 09:13:21 2012, LAJANDY wrote:
> > > > > > > Sometimes when a worker process picks up a job, it fails > > > > > > > with: > > > > > > > > > > > > > > "Can't use string ("") as an ARRAY ref while "strict refs" > > > > > > > in > > > > > > > use > > > > > > > at > > > > > > > /usr/lib/perl5/site_perl/5.8.8/Helios/Job.pm line 128.” > > > > > > > > > > > > > > in the ERROR table. No success or failure messages are > > > > > > > reported > > > > > > > in > > > > > > > job > > > > > > > history. > > > > > > > > > > > > > > The job is picked up later by another process and completes > > > > > > > successfully.
> > > > > > > > > > > > A GitHub branch has been created for this bug: > > > > > > https://github.com/logicalhelion/helios/tree/bug/rt79690 > > > > > >