On 2016-04-18 06:57:28, PIOTO wrote:
Show quoted text> On Mon Apr 18 03:37:41 2016, SREZIC wrote:
> > On 2016-04-18 03:33:42, SREZIC wrote:
> > > One of my smoker machines reported the following failure:
> > >
> > > # Failed test at t/003-load.t line 19.
> > > # got: '3.16'
> > > # expected: '3.06'
> > >
> > > # Failed test at t/003-load.t line 20.
> > > # got: '6.00'
> > > # expected: '5.94'
> > > # Looks like you failed 2 tests of 9.
> > > t/003-load.t ..........
> > > Dubious, test returned 2 (wstat 512, 0x200)
> > > Failed 2/9 subtests
> > >
> > > Looking at CPAN Testers results, it seems that this failure happens
> > > occasionally also on other systems:
> > >
http://matrix.cpantesters.org/?dist=Unix-Uptime%200.4000
> > >
> > > I think we see a race condition in t/003-load.t: you take the load
> > > averages twice, first through the perl module and then using the
> > > system command. Unfortunately time passes in between and the load
> > > average information may be updated when using the system command,
> > > so
> > > the values may differ.
> >
> > Attached a possible fix.
>
> Unfortunately, while I agree that there are problems with the current
> test, and I've seen the failures you've mentioned, I think that your
> change weakens the confidence that the tests give me that I'm actually
> doing something sensible. Now it's just checking that the code spits
> out the same numbers twice in a row.
This is not what the change does. The condition within the loop does a "pre-check" if the values returned by the system command and by Unix::Uptime are the same. If they are, then the loop will be prematurely ended and after that six of your unaltered test cases are run.
Three test cases were removed, but I think these are not really needed:
* the check that Unix::Uptime->load() returns a non-empty list --- the test script will fail anyway if any of $load1..$load15 are missing (because of the floating number check)
* the checks that the system uptime returns anything and may be correctly parsed --- if these would fail, then the later "is $load1, $pload1" etc. checks would also fail.
And what if the condition in the loop is false? Then uptime and Unix::Uptime are immediately called again. My assumption is that load averages are calculated not so often, and
https://en.wikipedia.org/wiki/Load_%28computing%29#Reckoning_CPU_load says it happens every five seconds on typical linux systems, which I can confirm on a linux and a non-linux (freebsd) system. So if the two calls happen within five seconds of the last call, then the values should be the same. This should be enough even on loaded/slow systems.
Show quoted text> I'd rather have something like a "fuzzy match" against the values
> parsed from `uptime`... perhaps some sort of "equals, plus or minus
> 1.0" test could help improve the situation instead?
+/-1.0 would make a big difference if it's 0.0 vs 1.0 on a single-cpu system. On the other hand, on multi-cpu machines it's possible that the load average drops/increases by a larger delta within five seconds.
Show quoted text> Anyways, this reminds me that I need up update the bugtracker metadata
> on this module... at this point, I'd rather have bugs & pull requests
> submitted on GitHub:
https://github.com/pioto/Unix-Uptime
"git am 0001-avoid-race-condition-in-load-average-test-RT-113853.patch" will give you a git commit with full attribution.