On 2018-10-26 12:14:39, SREZIC wrote:
Show quoted text> On 2018-10-19 03:01:45, SREZIC wrote:
> > On 2018-10-18 20:42:28, RKITOVER wrote:
> >
> > The fail:pass ratio on CPAN Testers is currently at 14:1474. But I
> > suspect that most testers don't make sure that a hanging test suite
> > is
> > generating a fail report, so the ratio is probably higher. In any
> > case, the problem does not happen always. I just tried the test suite
> > in a clean docker environment (based on debian:stretch) and had to
> > run
> > about 30 iterations until the test suite was hanging.
>
> To reproduce this, the following may be saved as a Dockerfile; the run
> instruction is in the first line. It'll take some iterations and
> minutes until the test suite starts to hang.
>
>
> # docker build -t perl-test . && while docker run perl-test; do sleep
> 1; done
> FROM debian:stretch
> RUN echo "cache invalidation #20181019"
> RUN apt-get -y update
> RUN apt-get -y install perl-modules
> RUN apt-get -y install make
>
> # Speed up installation
> RUN apt-get -y install libmoose-perl
> RUN apt-get -y install libpoe-perl
> RUN apt-get -y install libpackage-stash-perl libtry-tiny-perl
> libdatetime-perl
> RUN perl -MCPAN -e 'CPAN::Index->reload'
> RUN perl -i -pe 's{.*index_expire.*}{index_expire=>1,}'
> /root/.cpan/CPAN/MyConfig.pm
>
> CMD cpan -t RKITOVER/MooseX-Workers-0.24.tar.gz
Here more information while the test is hanging:
* a pstree snippet:
| | `-perl,24445 -MExtUtils::Command::MM -MTest::Harness -e undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch') t/00-compile.t t/00.load.t t/01.worker.t t/02.wheel.t t/03.role.t t/10.worker.enqueue.t t/11.worker.job.args.t t/11.worker.job.t t/12.worker.timeout1.t t/13.worker.timeout2.t t/20.worker.SIG.TERM.t t/30.worker.stdout_filter.line.t t/31.worker.stderr_filter.reference.t t/40.worker.stderr_filter.line.t t/41.worker.stderr_filter.reference.t t/50.consolidate.worker.results.t t/60.pass.object.from.child.t t/61.set.max_workers.at.ctor.time.t t/62.set.max_workers.at.run.time.t t/release-pod-syntax.t
| | `-perl,24457 t/02.wheel.t
| | `-(perl,24458)
-> so we have a zombie process here
* strace:
select(16, [4], [], [], {2720, 265822}) = 0 (Timeout)
select(16, [4], [], [], {3599, 999984}
-> it seems that select() is waiting for an hour on something from descriptor 4, and continues looping after that
* lsof:
perl 24457 root 4r FIFO 0,8 0t0 339363272 pipe
-> I suspect this pipe is/was connected to the zombie process