Subject: | crash of parent in accept() when a child reads/sysreads another client |
Date: | Mon, 16 Dec 2013 01:30:06 +0000 (GMT) |
To: | bug-IO-Socket-SSL [...] rt.cpan.org |
From: | Florian Bador <cpan2759139006 [...] bador.net> |
NOTE: We had a thread w/ Steffen Ullrich on this but I put my phone number in signature by mistake so we had to remove the thread.
Let's summarize everything we said, here.
##### The original message was : #####
Hi,
I am working on an HTTPS server using IO::Socket::SSL 1.962.
My program doesn't have any problem with HTTP (a different process using use IO::Socket)
With SSL, the following problem occurs, frequently but randomly :
-> The parent calls accept to wait for another client
-> In the same time, a forked child is reading some data from a previously accepted client
-> The parent dies without a single warning, STDERR message, or anything. (I use strict and warnings)
-> The child keeps going without problem
The dying parent does not return a true value when dying because ./program.pl && date never prints date.
Now, since there is absolutely no warning/error anywhere, here's the way I was able to find out where it occurs in the code :
-> I wrote plenty of :
select(undef, undef, undef, 0.005); warn("$$ This is line".__LINE__." at ".tv_interval(undef,[gettimeofday]) );
at almost every line...
-> I called the script this way:
./program.pl > log 2>&1 ; perl -e 'use Time::HiRes qw(gettimeofday tv_interval); print( tv_interval(undef,[gettimeofday])."\n");'
-> I then looked at what happens just before the time of termination of the program and concluded that it always happens when a child is calling read/sysread (successfully!) and the parent dies while waiting with accept().
What I tried (and that did not fix the problem) :
-> both sysread and read
-> creating a IO::select on the client in order to first make sure we can_read and can_write before reading / sysreading
(I'm not surprised it doesn't help because when the problem occurs, read/sysread succeeds)
-> same thing for the socket : waiting with can_read before calling accept().
NOTES:
-> yes, all "use" are made before forking, at beginning of parent.
-> I would say that it happens about once every 150 connections, sometimes the very first one, sometimes the 500th one.
Now I think the first step would be to get some feedback from the crash. Is there any way to get some more debugging info in a such case?
Thanks a lot!
// Florian Bador
// http://florianbador.com
////////////////////////////////////////////////////////////