Skip Menu |

This queue is for tickets about the IO-Socket-SSL CPAN distribution.

Report information
The Basics
Id: 91436
Status: rejected
Priority: 0/
Queue: IO-Socket-SSL

People
Owner: Nobody in particular
Requestors: cpan2759139006 [...] bador.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: crash of parent in accept() when a child reads/sysreads another client
Date: Mon, 16 Dec 2013 01:30:06 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
NOTE: We had a thread w/ Steffen Ullrich on this but I put my phone number in signature by mistake so we had to remove the thread. Let's summarize everything we said, here. ##### The original message was : ##### Hi, I am working on an HTTPS server using IO::Socket::SSL 1.962. My program doesn't have any problem with HTTP (a different process using use IO::Socket) With SSL, the following problem occurs, frequently but randomly : -> The parent calls accept to wait for another client -> In the same time, a forked child is reading some data from a previously accepted client -> The parent dies without a single warning, STDERR message, or anything. (I use strict and warnings) -> The child keeps going without problem The dying parent does not return a true value when dying because ./program.pl && date never prints date. Now, since there is absolutely no warning/error anywhere, here's the way I was able to find out where it occurs in the code : -> I wrote plenty of : select(undef, undef, undef, 0.005); warn("$$ This is line".__LINE__." at ".tv_interval(undef,[gettimeofday]) ); at almost every line... -> I called the script this way: ./program.pl > log 2>&1 ; perl -e 'use Time::HiRes qw(gettimeofday tv_interval); print( tv_interval(undef,[gettimeofday])."\n");' -> I then looked at what happens just before the time of termination of the program and concluded that it always happens when a child is calling read/sysread (successfully!) and the parent dies while waiting with accept(). What I tried (and that did not fix the problem) : -> both sysread and read -> creating a IO::select on the client in order to first make sure we can_read and can_write before reading / sysreading (I'm not surprised it doesn't help because when the problem occurs, read/sysread succeeds) -> same thing for the socket : waiting with can_read before calling accept(). NOTES: -> yes, all "use" are made before forking, at beginning of parent. -> I would say that it happens about once every 150 connections, sometimes the very first one, sometimes the 500th one. Now I think the first step would be to get some feedback from the crash. Is there any way to get some more debugging info in a such case? Thanks a lot! // Florian Bador // http://florianbador.com ////////////////////////////////////////////////////////////
Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/ sysreads another client
Date: Mon, 16 Dec 2013 01:45:20 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
I will now send fake SMTP messages of all our conversation, below... (let's see if it works...) // Florian Bador // http://florianbador.com ////////////////////////////////////////////////////////////
Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/sysreads another client
Date: Date: Mon, 16 Dec 2013 01:45:30 +0000 (GMT)
To: bug-io-socket-ssl [...] rt.cpan.org
RT-Send-CC:
From: Steffen_Ullrich [...] genua.de
Hi, I need more information to understand your problem. I need at least: - which version of IO::Socket::SSL and Net::SSLeay are you using, e.g. perl -MIO::Socket::SSL -e 'warn $IO::Socket::SSL::VERSION' perl -MNet::SSLeay -e 'warn $Net::SSLeay::VERSION' - which operating system do you use, e.g. on windows a fork does not create a real process, only a thread, which can have subtible interactions with the parent "process": perl -e 'warn $^O' - the version of perl itself: perl -v - a minimal program, which is sufficient reproduce the problem
Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/ sysreads another client
Date: Mon, 16 Dec 2013 01:56:30 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
Thank you Steffen. Linux 3.2.22-grsec #1 i686 GNU/Linux Perl : This is perl 5, version 16, subversion 0 (v5.16.0) built for i686-linux-64int IO::Socket::SSL 1.962 Net::SSLeay 1.55 I wrote a very simplified version, and made thousands of HTTPS requests w/ new connections... No crash after a few hours of testing, but by pushing hard on the requests, the parent sometimes ends up blocked in accept() instead of crashing, even once all children/clients are done. For that reason, I decided to handle it w/ a non-blocking socket, and there is no problem w/ accept(). So I am adding step by step code and modules I need until I get the crash bug again. I suspect one of the modules to create some sort of weird conflict... I will also try my final program w/ non-blocking socket. That's good news because it means that at a particular point we'll get the bug again, but that will take me some time at each test to make sure it can handle hours of intense requests. Will be back to keep you aware. Thanks! // Florian Bador // http://florianbador.com //////////////////////////////////////////////////////////// ########## FROM rt-cpan-org-return@perl.org SENT 6 min EARLIER: ########## Show quoted text
> Hi, > I need more information to understand your problem. > I need at least: > - which version of IO::Socket::SSL and Net::SSLeay are you using, e.g. > perl -MIO::Socket::SSL -e 'warn $IO::Socket::SSL::VERSION' > perl -MNet::SSLeay -e 'warn $Net::SSLeay::VERSION' > - which operating system do you use, e.g. on windows a fork does not create a real process, only a thread, which can have subtible interactions with the parent "process": > perl -e 'warn $^O' > - the version of perl itself: > perl -v > - a minimal program, which is sufficient reproduce the problem
Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/ sysreads another client
Date: Mon, 16 Dec 2013 01:59:52 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
So, Right now, with a minimal version we no longer have a mysterious death of the parent that handles the socket, but instead, a freeze of it, occurring also in accept(). And now that the socket itself is non-blocking, there is less probability to be in accept() when a child is calling sysread. Even when socket was blocking, it didn't occur systematically at all, so now it is even less often for that reason. Now, I could put that parent in a fork, so its children become forks of this fork (grandchildren) and then kill the process when it is blocked or gone, and restart it immediately... But that is really dirty coding... And some clients may be rejected by the time we restart the socket. Is there any alternative to accept() that I could try? NOTE: -> remember that I tried to use IO::Select and can_read / can_write. I also tried to care for $SSL_ERROR. But these did not change anything. -> since the probability of seeing this problem is now reduced, I make about 1 connection/s in my tests (a loop using curl -k) -> I looked at a frozen process, and it sometimes takes 20% of a good CPU, sometimes 0. Surprise: It seems that my 1 connection/s (random timer occurring sometimes at the same moment) using curl does not create the problem, and that it only comes when I add on the top of that, requests from Firefox! (difficult to be 100% sure so far though) Would it be a SNI problem? I apologize for the fact that it becomes pretty difficult to create the problem, it requires now some serious network flood. (of course you can try w/ a blocking socket, which will make it happen more often) But, I never had any problem with plain HTTP via IO::Socket, so there is obviously something wrong in the SSL connection. Attached is my minimal program. I run it this way : ./ssl-crash-minimal.pl > crash-log 2>&1 ; perl -e 'use Time::HiRes qw(gettimeofday tv_interval); print( tv_interval(undef,[gettimeofday])."\n");' WARN: the output log will become pretty big. That way, once it's frozen you can grep the parent's pid in crash-log and see the last line logged (always "Calling accept ...") You then grep the approximate time of this last line to look at the lines around and realize some sysread of child/children was called at the same moment. Thanks again. // Florian Bador // http://florianbador.com ////////////////////////////////////////////////////////////
Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/ sysreads another client
Date: Mon, 16 Dec 2013 02:01:53 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
(oops, a file was attached in the other thread) // Florian Bador // http://florianbador.com ////////////////////////////////////////////////////////////

Message body is not shown because sender requested not to inline it.

Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/ sysreads another client
Date: Mon, 16 Dec 2013 02:02:51 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
PS: I still have a blocked parent process running, if you want me to try things with it, let me know, I don't kill it yet. I tried : -> to send SIGCONT (kill 18) : nothing -> to send SIGSTOP (kill 19) : that did make the stop in the shell and process is still alive -> to send SIGCONT after the STOP : that did not make it leave the accept() where it is currently blocked. -> curl -vk shows that we can still connect but when client sends handshake, the answer never comes. We can summarize the problem by saying that accept() sometimes becomes blocking regardless of blocking(0) and ignores incoming new clients, so it waits forever, and that it is probably when we are in accept doing things, and that a child reads/sysreads, that things get messed up and we will never leave this accept() function. // Florian Bador // http://florianbador.com ////////////////////////////////////////////////////////////

Message body is not shown because sender requested not to inline it.

Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/sysreads another client
Date: Mon, 16 Dec 2013 02:05:00 +0000 (GMT)
To: bug-io-socket-ssl [...] rt.cpan.org
RT-Send-CC:
From: Steffen_Ullrich [...] genua.de
Am Do 05. Dez 2013, 23:13:18, cpan2759139006@bador.net schrieb: Show quoted text
> PS: > > I still have a blocked parent process running, if you want me to try > things with it, let me know, I don't kill it yet.
IO::Socket::SSL::accept consists of an accept on the inet socket, followed by an SSL handshake. While the accept on the inet socket is fully handled in the OS kernel, the SSL handshake requires reads and writes on the underlying TCP connection. While it should be possible to do a non-blocking SSL handshake with IO::Socket::SSL::accept, your sample code does not handle it correctly. It only sets the server socket as non-blocking, but the SSL handshake is done in the new socket returned from IO::Socket::accept, which is not set non-blocking. Because you will fork anyway to handle the child I would rather recommend you to only do an IO::Socket::accept, then fork and then do the SSL handshake in the forked child, e.g: my $server = IO::Socket::INET->new( Listen => .., LocalAddr => ..); while (1) { my $c = $server->accept or next; # TCP accept defined( my $pid = fork ) or die $!; # fork child next if $pid; # parent will wait for next child # child here close($server); # not needed in child IO::Socket::SSL->start_SSL($c, SSL_server => 1, .. other SSL_args ... ) or die "ssl handshake failed: $SSL_ERROR" ... now we have an SSL socket in $c ... }
Subject: RE: [rt.cpan.org #91436] crash of parent in accept() when a child reads/ sysreads another client
Date: Mon, 16 Dec 2013 02:06:24 +0000 (GMT)
To: bug-IO-Socket-SSL [...] rt.cpan.org
From: Florian Bador <cpan2759139006 [...] bador.net>
Danke schoen Steffen! It seems to be prefect! :) I don't regret to create my own HTTP/HTTPS server, I measure everything w/ Time::HiRes and beside the handshakes that take about 300ms, all pages respond in 0ms, not one, just zero! on a pretty loaded server (I put everything in ramfs). I have about 100-200ms using apache+cgi... That's just amazing! Thank you very much! So in the end, we will probably never know exactly why that weird & random problem happens but if anyone passing here is looking for a solution, that is it! ============================= === SUMMARY OF THE SOLUTION: === -> Accept w/ IO::Socket::INET just like a plain HTTP server and the same options (except the port of course: 443) -> Socket can even be blocking without problem -> Once forked, close $sock in the child and upgrade $client to SSL -> $client is then ready to be treated like any IO::Socket::INET handle. NOTE: some browsers (such as chrome) create 3 parallel connections when opening a page in order to be faster (avoiding further handshakes), and these connections sometime request nothing, they are just created in case of future need. (of course that is not very nice for the cpu usage... thank you google for giving everyone the illusion to create a faster browser.) // Florian Bador // http://florianbador.com ////////////////////////////////////////////////////////////

Message body is not shown because sender requested not to inline it.

issue closed, because not a problem of IO::Socket::SSL