Uh oh.... the plot thickens!
I did exactly as you said. I downloaded 1.008, copied the two superbig
tests from 1.010/t/ into 1.008/t/, and ran with AUTOMATED_TESTING=1.
Everything passed!
v1.008 Output:
I also tried v1.010 with AUTOMATED_TESTING=1. Same result, everything
passed:
And to make this even more confusing, my IRC server which uses SSLify got
stuck again with 100% CPU today, using the latest SSLify v1.010. So it
looks like I'm doing something quite different which is causing the issue.
Oh, just to give you an update on that. I am using SSLify for an IRC
server, but it is also an HTTPS web server in the same process. Since I
was able to reproduce the issue using a simple HTTPS GET, that is the bug
report I sent to you. But over the last few months I have tried a number
of things in my IRC server, including completely disabling SSL in the HTTP
server, and just having SSLify run on the IRC connections (it's using
POE::Component::Server::IRC). Anyway, even after that change, it still
randomly sticks at 100% CPU (while still remaining somewhat responsive and
usable). Such a weird issue!
Here's my code if you are interested:
Anyway, thanks for your time and effort in looking at this! One bug at a
time I guess!
- Joe
On Thu, Nov 13, 2014 at 10:44 AM, Apocalypse via RT <
bug-POE-Component-SSLify@rt.cpan.org> wrote:
> <URL:
https://rt.cpan.org/Ticket/Display.html?id=95071 >
>
> Awesome! I'm glad we are on the right track!
>
> Oh, I noticed an omission in the test you pasted - I forgot to tell
> you that installing IO::Prompt::Tiny alone wasn't enough to enable the
> superbig tests, d0h! In your case, you were running under
> PERL_MM_USE_DEFAULT=1 and it defaulted to NO. Can you please re-run it
> without the default, or override it via AUTOMATED_TESTING=1 ?
>
> The superbig tests actually reproduce your issue on my machine with
> SSLify v1.008. That's why I really wanted to verify that it fixed your
> problem. If you can, please try copying (simple_superbig.t and
> simple_parallel_superbig.t) from v1.010 over to v1.008 and running the
> testsuite and comparing the result to v1.010? Hopefully on your setup
> v1.008 will lockup and v1.010 eats it for breakfast. Thanks for digging
> into this with me :)
>
> On 11/12/2014 08:22 PM, Joseph Huckaby via RT wrote:
> suspected.
> > That box I was running on only had 1 GB of RAM, and much of it was
> already
> > in use. I tried the test again on a VERY LARGE box (Amazon c3.8xlarge,
> 64
> > GB RAM), and all tests pass! Here are the results:
> >
> >
>
https://www.dropbox.com/s/924lcwdm4xg0y4v/sslify-v1.010-test-output-2014-11-12-pass.txt?dl=0
> >
> > (I installed IO::Prompt::Tiny as well)
> >
> > Okay, I am now installing SSLify v1.010 on my primary production server
> > now, and will see if any 100% CPU issues occur. I had a nightly crontab
> > restart going, but I have disabled that, in an effort to try and
> reproduce
> > it more readily.
> >
> > Thanks again, and I'll keep you posted!
> >
> > - Joe
> >
> >
> > On Wed, Nov 12, 2014 at 6:59 PM, Apocalypse via RT <
> > bug-POE-Component-SSLify@rt.cpan.org> wrote:
> >
> of
> >> testers confirmed the same thing you saw.
> >>
> >>
> >>
>
http://www.cpantesters.org/distro/P/POE-Component-SSLify.html?grade=1&perlmat=2&patches=2&oncpan=1&distmat=1&perlver=ALL&osname=ALL&version=1.010
> >>
> >> Two things jumped out at me:
> >> * They all failed in the same tests (parallel_large or the
> >> superbig ones). They use HUGE strings to stress the buffers and it might
> >> be memory limitations somewhere? Can you confirm that it isn't an issue
> >> on your end?
> >>
> >> * All of those failures are during the initial connection
> stage
> >> (loading the cert pem files). Maybe something funky going on with
> >> filehandles? I suspect the memory running out and not being able to
> >> create the CTX object is the culprit. While this problem is interesting,
> >> it's not related to your issue at all! You can safely ignore this :(
> >>
> >> I'll continue testing and see if I can find something else to
> >> patch, ha! In the meantime can you please try and reproduce your lock-up
> >> with the latest version of SSLify using your scripts? Also please give
> >> the superbig tests a whirl by installing IO::Prompt::Tiny and re-running
> >> the testsuite.
> >>
> >> Maybe my testsuite didn't replicate yours fully as you mentioned
> >> regarding latency being a possible factor. I made changes to
> >> SSLify::write() that worked around the lock-up that I manged to
> >> replicate in the superbig tests on my machine. Please let me know if you
> >> experience the lockup and I'll try to reproduce it on my side.
> >>
> >> Thanks again and I hope we'll squash this bug soon!
> >>
> >> On 11/12/2014 06:23 PM, Joseph Huckaby via RT wrote:
> >> on
> >>> the t/simple_parallel_large.t test number 86 with this output:
> >>>
> >>> t/simple_parallel_large.t ..... 86/?
> >>> # Failed test 'SERVER: SSLify_Options 3105: private key
> >>> # '
> >>> # at t/simple_parallel_large.t line 54.
> >>>
> >>> # Failed test 'SERVER: Server_SSLify Please do SSLify_Options()
> first (
> >>> or pass in a $ctx object ) at
> >>>
> >>
> /root/.cpanm/work/1415844865.2217/POE-Component-SSLify-1.010/blib/lib/POE/Component/SSLify.pm
> >>> line 247.
> >>> # '
> >>> # at t/simple_parallel_large.t line 57.
> >>> Can't use an undefined value as a symbol reference at
> >>>
> >>
> /root/.cpanm/work/1415844865.2217/POE-Component-SSLify-1.010/blib/lib/POE/Component/SSLify.pm
> >>> line 446.
> >>>
> >>> Here is the full test output if it helps:
> >>>
> >>
>
https://www.dropbox.com/s/gpd9pyhs8lnbxig/sslify-v1.010-test-output-2014-11-12.txt?dl=0
> >>> Thanks again for your time on this!
> >>>
> >>> - Joe
> >>>
> >>>
> >>> On Tue, Nov 11, 2014 at 10:22 PM, Apocalypse via RT <
> >>> bug-POE-Component-SSLify@rt.cpan.org> wrote:
> >>>
> 16K
> >>>> limitation in the TLS packet sizes. It seems like on certain
> conditions
> >> the
> >>>> write() call will lock up because the underlying buffers was not
> >> flushed. I
> >>>> haven't found the root cause for this but with my modification by
> >> clamping
> >>>> the writes the superbig tests passed! I've just released v1.010 to
> CPAN
> >>>> with the change, can you please test it to see if it helps you? If
> not,
> >>>> then I'll have to attack it from a different angle :)
> >>>>
> >>>> On Sat Apr 26 12:20:58 2014, jhuckaby@gmail.com wrote:
> >>>>> Hey SSLify Team,
> >>>>>
> >>>>> I believe I have found a bug using POE::Component::Server::TCP and
> >>>>> SSLify. I have a very simple HTTPS web server (created using the
> >>>>> cookbook example), which does nothing except serve up 300K of plain
> >>>>> text for every incoming HTTPS GET request. This works fine until it
> >>>>> receives multiple requests at once. Then it shoots up to 100% CPU
> and
> >>>>> stays there indefinitely, long after the requests have completed.
> >>>>>
> >>>>> Here is a snapshot of "top" on my server after sending in 10
> >>>>> simultaneous requests and allowing them to complete fully:
> >>>>>
> >>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >>>>> 1365 root 20 0 191m 20m 4168 R 99.9 0.6 0:10.95 perl
> >>>>>
> >>>>> The CPU stays at or around 99.9% indefinitely (I have witnessed
> >>>>> several days at least). Also, something to note, even after the CPU
> >>>>> is pinned at 100%, the server still accepts new incoming connections,
> >>>>> and serves them.
> >>>>>
> >>>>> Here is a ZIP file containing my script and sample SSL cert:
> >>>>>
https://dl.dropboxusercontent.com/u/201533/poe-tcp-sslify-bug.zip
> >>>>>
> >>>>> Here are links to the individual files:
> >>>>>
https://dl.dropboxusercontent.com/u/201533/poe-tcp-sslify-
> >>>>> bug/server.pl
> >>>>>
>
https://dl.dropboxusercontent.com/u/201533/poe-tcp-sslify-bug/ssl.crt
>
https://dl.dropboxusercontent.com/u/201533/poe-tcp-sslify-bug/ssl.key
> >>>>>
> >>>>> I can reproduce the issue every time by sending in 10 simultaneous
> >>>>> HTTPS GET requests from a shell script such as this:
> >>>>>
> >>>>> #!/bin/sh
> >>>>> # Fetch URL 10 times simultaneously
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>> curl --insecure "$1" >/dev/null 2>&1 &
> >>>>>
> >>>>> The --insecure flag is to prevent curl from validating the cert
> (which
> >>>>> is self-signed).
> >>>>>
> >>>>> I am running:
> >>>>> POE version 1.358
> >>>>> POE::Component::SSLify version 1.008
> >>>>>
> >>>>> This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-
> >>>>> linux-thread-multi
> >>>>> Linux pvp.effectgames.com 3.4.82-69.112.amzn1.x86_64 #1 SMP Mon Feb
> 24
> >>>>> 16:31:21 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> >>>>>
> >>>>> openssl-perl-1.0.1g-1.69.amzn1.x86_64
> >>>>> openssl-1.0.1g-1.69.amzn1.x86_64
> >>>>>
> >>>>> Other Notes: I cannot reproduce this without SSLify. Meaning, if I
> >>>>> strip out all the SSL code and just run a plain HTTP server,
> >>>>> everything works fine. Also note that I cannot seem to reproduce
> this
> >>>>> on my local OS X machine hitting localhost, although I suspect that
> is
> >>>>> because localhost is a virtual interface, and all the requests
> >>>>> complete "instantly". This bug seems to require a slower network
> (DSL
> >>>>> to Amazon EC2 for example) to occur.
> >>>>>
> >>>>> If there is anything else you'd like me to try, please let me know.
> >>>>> My server is at your disposal!
> >>>>>
> >>>>> Thanks for your time!
> >>>>>
> >>>>> - Joe
> >>>> --
> >>>> ~Apocalypse
> >>>>
> >>
> >>
>
>
>