Skip Menu |

This queue is for tickets about the ZeroMQ CPAN distribution.

Report information
The Basics
Id: 64944
Status: resolved
Priority: 0/
Queue: ZeroMQ

People
Owner: Nobody in particular
Requestors: jason [...] ball.net
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.07
Fixed in: (no value)



Subject: Memory Leak
Looks like we have a memory leak in the API. I'm seeing this add up to 1MB/Second in the system I'm currently implementing and have traced that back to ZeroMQ. I've attached 2 test cases: * using the Object/API * using the RAW API memtest.t # Looks like you failed 5 tests of 15. # Looks like your test exited with 5 just after 15. memtest_raw.t # Looks like you failed 2 tests of 15. # Looks like your test exited with 2 just after 15. For one I believe the message returned from a ZeroMQ->recv is never deallocated/freed. Explicitely calling 'zmq_msg_close()' on the message in the raw test drops the number of failed tests from 5 down to 2. I'll run the prior C test from rt64836 under valgrind later to see how it performs...
From: jason [...] ball.net
and the test cases
Subject: memtest.t
use strict; use Test::More; use Test::Requires qw( Test::TCP Test::Valgrind ); use ZeroMQ qw(ZMQ_PUB ZMQ_SUB ZMQ_SNDMORE); BEGIN { use_ok "ZeroMQ"; use_ok "ZeroMQ::Constants", ":all"; } my $port = empty_port(); test_tcp( client => sub { my $port = shift; my $ctxt = ZeroMQ::Context->new(); my $sock = $ctxt->socket(ZMQ_SUB); $sock->connect("tcp://127.0.0.1:$port" ); $sock->setsockopt(ZMQ_SUBSCRIBE, ''); my $msg; $msg = $sock->recv(); }, server => sub { my $port = shift; my $ctxt = ZeroMQ::Context->new(); my $sock = $ctxt->socket(ZMQ_PUB); $sock->bind("tcp://127.0.0.1:$port"); sleep 2; $sock->send("1"); sleep 5; } ); done_testing;
Subject: memtest_raw.t
use strict; use Test::More; use Test::Requires qw( Test::TCP Test::Valgrind ); BEGIN { use_ok "ZeroMQ::Raw"; use_ok "ZeroMQ::Constants", ":all"; } my $port = empty_port(); test_tcp( client => sub { my $port = shift; my $ctxt = zmq_init(); my $sock = zmq_socket($ctxt, ZMQ_SUB); zmq_connect($sock,"tcp://127.0.0.1:$port" ); zmq_setsockopt($sock, ZMQ_SUBSCRIBE, ''); my $rawmsg = undef; $rawmsg = zmq_recv($sock); zmq_msg_close($msg); note "OK"; }, server => sub { my $port = shift; my $ctxt = zmq_init(); my $sock = zmq_socket($ctxt, ZMQ_PUB); zmq_bind($sock, "tcp://127.0.0.1:$port"); sleep 2; note "Server sending ordered data... (numbers 1..1000)"; my $msg = zmq_msg_init_data("1"); zmq_send($sock, $msg); zmq_msg_close($msg); note "OK"; } ); done_testing;
Hmm, I've tracked down the problem up to confirming that ZeroMQ::Raw::Message's C-level destructor isn't being called. Somehow it's escaping from being garbage collected (via Perl's mechanism) Doing something like undef $rawmsg; Also fixes the problem, but currently I'm stuck figuring out what's causing this...
Ah, wait, nevermind. no confirmation just yet :/
From: jason [...] ball.net
On Thu Jan 20 00:04:45 2011, DMAKI wrote: Show quoted text
> Ah, wait, nevermind. no confirmation just yet :/
Pardon my ignorance of perl XS... shouldnt the class files contain a destructor, for example: ZeroMQ::Message sub DESTROY { print "Destructor Called\n"; ZeroMQ::Raw::zmq_msg_close( $_[0]i->{"_message"}); ); Adding this appears to fix 2 of the leaks... # Looks like you failed 3 tests of 15. # Looks like your test exited with 3 just after 15.
I added a couple of Safefree calls -- valgrind still bitches, but there are a ton of errors that valgrind reports by just running perl, so I'm inclined to say that the ZeroMQ related leaks are gone. If you still see problems, let's discuss online elsewhere -- I think I'd need more info.
From: jason [...] ball.net
I'm not seeing any change in behaviour there. Note that adding the explicit destroy to the wrapper (ZeroMQ::Message) did stop valgrind from bitching about two of the tests, your most recent commit still complains about these. I'm not sure how to chase this one down... I'll look at it again when I get home from work. I don't suppose by any chance your in Australia and attending LCA next week ;) Cheers.
Show quoted text
> shouldnt the class files contain a destructor, for example: > > ZeroMQ::Message > > > sub DESTROY { > print "Destructor Called\n"; > ZeroMQ::Raw::zmq_msg_close( $_[0]i->{"_message"}); > );
In pure perl, yes. However, in XS you can add a magic entry in the MGf_free, which is in xs/mg-inc.xs somewhere in there you'll see a declaration of PerlZMQ_Message_vtbl, and in that struct you will see that there's a PerlZMQ_Message_mg_free being set. By enabling this bit, when the Perl object made magical goes out of scope the free function gets called automatically. This is why you don't need to use DESTROY. BTW, the reason we don't use DESTROY there is because we'd like to do C-level messing around when the underlying perl supports threads or not :)
If you're around in IRC (freenode or irc.perl.org) or skype, ping me (lestrrat). I'll be at $work for at least 3 more hours.
From: jason [...] ball.net
On Thu Jan 20 01:05:07 2011, DMAKI wrote: Show quoted text
> If you're around in IRC (freenode or irc.perl.org) or skype, ping me > (lestrrat). I'll be at $work for > at least 3 more hours.
I'm not so sure on the generic valgrind errors in perl as I've had no errors in prior tests. I'll code a number of simpler tests to check for leaks on the basic objects as we should be able to get passable tests. This will also help chase down any problems or prove that there are none, or that I'm just plain wrong ;) My system is processing thousands of messages per second, so any leaks no matter how minor add up quickly. Cheers
From: jason [...] ball.net
ok - so valgrind even complains when the only ZeroMQ statement in the script is: use_ok "ZeroMQ"; the script being: use strict; use Test::More; BEGIN { } use Test::Requires 'Test::Valgrind', 'XML::Parser', 'ZeroMQ' ; done_testing; Do the same with the RAW api and the tests pass: use Test::Requires 'Test::Valgrind', 'XML::Parser', 'ZeroMQ::RAW' ; done_testing; So I can't get past the basics with the cooked api... I'll proceed with the raw interface and see where it leads.
From: jason [...] ball.net
Possibly some progress... It looks like the Message_free is being called, but (potentially) that the memory isn't being deallocated... I've compiled 0.07 with tracing enabled and the attached test program which checks the number of SV's allocated before and after a call. For example: subtest init => sub { my $startcount = Devel::Leak::NoteSV($handle); my $ctxt = zmq_init(); ok($ctxt, "zmq_init()"); undef $ctxt; my $endcount = Devel::Leak::NoteSV($handle); is($endcount, $startcount, "SV Leak Check"); }; The undef is to force the cleanup of the variable and the associated storage, and then comparing the allocations before and after. In the above case the output is: context create context wrapper 4f5ee10 with zmq context 4f5ee30 for thread 4bb2010 at rt64944.t line 16. ok 1 - zmq_init() Context_free for context wrapper 4f5ee10 with zmq context 4f5ee30 for thread 4bb2010 at rt64944.t line 21. so Context_free was called, but the vector wasn't released... A similar test covers ZeroMQ::Raw::Message using the raw API, after the creation and destruction of a single message object we have: zmq_msg_init_data created message 527bd40 at rt64944.t line 60. ok 1 - zmq_msg_init_data() Message_free for 527bd40 at rt64944.t line 61. # got: '12467' # expected: '12460' Run the test multiple time... # got: '12512' # expected: '12470' or Im just outright wrong... Looking at the code and it is obviously calling zmq_msg_close() and Safefree(), so I'm clearely confused...
From: jason [...] ball.net
On Thu Jan 20 20:00:50 2011, agronaught wrote: Show quoted text
> Possibly some progress... > > It looks like the Message_free is being called, but (potentially) that > the memory isn't being deallocated... > > I've compiled 0.07 with tracing enabled and the attached test program > which checks the number of SV's allocated before and after a call. > > For example: > > > subtest init => sub { > my $startcount = Devel::Leak::NoteSV($handle); > > my $ctxt = zmq_init(); > ok($ctxt, "zmq_init()"); > > undef $ctxt; > > my $endcount = Devel::Leak::NoteSV($handle); > is($endcount, $startcount, "SV Leak Check"); > }; > > > The undef is to force the cleanup of the variable and the associated > storage, and then comparing the allocations before and after. > > In the above case the output is: > > context create context wrapper 4f5ee10 with zmq context 4f5ee30 for > thread 4bb2010 at rt64944.t line 16. > ok 1 - zmq_init() > Context_free for context wrapper 4f5ee10 with zmq context 4f5ee30 for > thread 4bb2010 at rt64944.t line 21. > > so Context_free was called, but the vector wasn't released... > > A similar test covers ZeroMQ::Raw::Message using the raw API, after the > creation and destruction of a single message object we have: > > zmq_msg_init_data created message 527bd40 at rt64944.t line 60. > ok 1 - zmq_msg_init_data() > Message_free for 527bd40 at rt64944.t line 61. > > > # got: '12467' > # expected: '12460' > > Run the test multiple time... > > # got: '12512' > # expected: '12470' > > or Im just outright wrong... > > Looking at the code and it is obviously calling zmq_msg_close() and > Safefree(), so I'm clearely confused... > >
Subject: rt64944.t
#!/usr/bin/perl use strict; use Test::More; use Devel::Leak; BEGIN { use_ok 'ZeroMQ::Raw'; use_ok 'ZeroMQ::Constants', qw(:all); } my $handle; subtest init => sub { my $startcount = Devel::Leak::NoteSV($handle); my $ctxt = zmq_init(); ok($ctxt, "zmq_init()"); undef $ctxt; my $endcount = Devel::Leak::NoteSV($handle); is($endcount, $startcount, "SV Leak Check"); }; subtest socket => sub { my $startcount = Devel::Leak::NoteSV($handle); my $ctxt = zmq_init(); ok($ctxt, "zmq_init()"); my $sock = zmq_socket($ctxt, ZMQ_PUB); ok($sock, "zmq_socket()"); undef $sock; undef $ctxt; my $endcount = Devel::Leak::NoteSV($handle); is($endcount, $startcount, "SV Leak Check"); }; subtest message => sub { my $startcount = Devel::Leak::NoteSV($handle); my $msg = zmq_msg_init_data("test"); ok($msg, "zmq_msg_init_data()"); undef $msg; my $endcount = Devel::Leak::NoteSV($handle); is($endcount, $startcount, "SV Leak Check"); }; subtest message => sub { my $startcount = Devel::Leak::NoteSV($handle); my $msg = zmq_msg_init_data("test"); ok($msg, "zmq_msg_init_data()"); undef $msg; my $endcount = Devel::Leak::NoteSV($handle); is($endcount, $startcount, "SV Leak Check"); }; subtest manymessage => sub { my $startcount = Devel::Leak::NoteSV($handle); for(0..5) { my $msg = zmq_msg_init_data("test:$_"); ok($msg, "zmq_msg_init_data()"); } my $endcount = Devel::Leak::NoteSV($handle); is($endcount, $startcount, "SV Leak Check"); }; done_testing;
Thanks - I'll see if I can take some time to review your stuff. If I remember correctly, you were using a stock perl installation on Fedora or some such, so I assume perl 5.10.x with threads, correct? If not, can you please tell me again what your perl is? (oh wait, you upgraded to 5.12? I forget) Is there anything else particular about your setup that I should know about? Also, can you please show me the test output on your system -- I'd like to see what your valgrind output looks like, too. maybe I'm missing something on my OS X install.
From: jason [...] ball.net
perl 5.8.8 on Redhat 5/Centos 5. The full output of the test file being: perl ~/reflexdog/test/rt64944.t ok 1 - use ZeroMQ::Raw; ok 2 - use ZeroMQ::Constants; context create context wrapper 31a6200 with zmq context 31a6220 for thread 2df9010 at /root/reflexdog/test/rt64944.t line 16. ok 1 - zmq_init() Context_free for context wrapper 31a6200 with zmq context 31a6220 for thread 2df9010 at /root/reflexdog/test/rt64944.t line 21. not ok 2 - SV Leak Check # Failed test 'SV Leak Check' # at /root/reflexdog/test/rt64944.t line 22. # got: '12430' # expected: '12416' 1..2 # Looks like you failed 1 test of 2. not ok 3 - init # Failed test 'init' # at /root/reflexdog/test/rt64944.t line 23. context create context wrapper 326e6e0 with zmq context 326e700 for thread 2df9010 at /root/reflexdog/test/rt64944.t line 28. ok 1 - zmq_init() created socket 3270520 at /root/reflexdog/test/rt64944.t line 30. ok 2 - zmq_socket() Socket_free 3270520 at /root/reflexdog/test/rt64944.t line 34. Context_free for context wrapper 326e6e0 with zmq context 326e700 for thread 2df9010 at /root/reflexdog/test/rt64944.t line 36. not ok 3 - SV Leak Check # Failed test 'SV Leak Check' # at /root/reflexdog/test/rt64944.t line 37. # got: '12457' # expected: '12436' 1..3 # Looks like you failed 1 test of 3. not ok 4 - socket # Failed test 'socket' # at /root/reflexdog/test/rt64944.t line 38. zmq_msg_init_data created message 33354a0 at /root/reflexdog/test/rt64944.t line 42. ok 1 - zmq_msg_init_data() Message_free for 33354a0 at /root/reflexdog/test/rt64944.t line 45. not ok 2 - SV Leak Check # Failed test 'SV Leak Check' # at /root/reflexdog/test/rt64944.t line 46. # got: '12460' # expected: '12446' 1..2 # Looks like you failed 1 test of 2. not ok 5 - message # Failed test 'message' # at /root/reflexdog/test/rt64944.t line 47. zmq_msg_init_data created message 33fc480 at /root/reflexdog/test/rt64944.t line 51. ok 1 - zmq_msg_init_data() Message_free for 33fc480 at /root/reflexdog/test/rt64944.t line 54. not ok 2 - SV Leak Check # Failed test 'SV Leak Check' # at /root/reflexdog/test/rt64944.t line 55. # got: '12467' # expected: '12460' 1..2 # Looks like you failed 1 test of 2. not ok 6 - message # Failed test 'message' # at /root/reflexdog/test/rt64944.t line 56. zmq_msg_init_data created message 34c3260 at /root/reflexdog/test/rt64944.t line 60. ok 1 - zmq_msg_init_data() Message_free for 34c3260 at /root/reflexdog/test/rt64944.t line 61. zmq_msg_init_data created message 34c3260 at /root/reflexdog/test/rt64944.t line 60. ok 2 - zmq_msg_init_data() Message_free for 34c3260 at /root/reflexdog/test/rt64944.t line 61. zmq_msg_init_data created message 34c3260 at /root/reflexdog/test/rt64944.t line 60. ok 3 - zmq_msg_init_data() Message_free for 34c3260 at /root/reflexdog/test/rt64944.t line 61. zmq_msg_init_data created message 34c3260 at /root/reflexdog/test/rt64944.t line 60. ok 4 - zmq_msg_init_data() Message_free for 34c3260 at /root/reflexdog/test/rt64944.t line 61. zmq_msg_init_data created message 34c3260 at /root/reflexdog/test/rt64944.t line 60. ok 5 - zmq_msg_init_data() Message_free for 34c3260 at /root/reflexdog/test/rt64944.t line 61. zmq_msg_init_data created message 34c3260 at /root/reflexdog/test/rt64944.t line 60. ok 6 - zmq_msg_init_data() Message_free for 34c3260 at /root/reflexdog/test/rt64944.t line 61. not ok 7 - SV Leak Check # Failed test 'SV Leak Check' # at /root/reflexdog/test/rt64944.t line 64. # got: '12512' # expected: '12470' 1..7 # Looks like you failed 1 test of 7. not ok 7 - manymessage # Failed test 'manymessage' # at /root/reflexdog/test/rt64944.t line 65. 1..7 # Looks like you failed 5 tests of 7. As you said the valgrind reports a lot of perl garbage... still trying to get past this. J.
From: jason [...] ball.net
More digging... using Devel::LeakTrace we have: leaked GV(0x201d0c70) from ./rt64944.t line 18 leaked SV(0x201d0c80) from ./rt64944.t line 18 leaked HV(0x201d0c90) from ./rt64944.t line 18 leaked GV(0x201d0ca0) from ./rt64944.t line 18 leaked SV(0x201d0cb0) from ./rt64944.t line 18 leaked GV(0x201d0cc0) from ./rt64944.t line 18 leaked SV(0x201d0cd0) from ./rt64944.t line 18 which is my $ctxt = zmq_init();
From: jason [...] ball.net
Found the major culprit in my code... Don't know why yet... I have a number of event loops and sockets that are frequently polled for data, so this is non blocking I use: my $request=$zeromqRepSock->recv(ZMQ_NOBLOCK); After every call the server is leaking an SV (24 bytes) and given the number of times this is called (10ms loop timer) that adds up to a lot of memory very quickly. By comparison the blocking call my $request=$zeromqRepSock->recv(); Does not leak an SV when it returns.
Just FYI, I'm not done fixing stuff yet, but I am looking at the problem. I don't know how much time I can spend on this this weekend, but just thought I let you know I'm looking at it.
And I just pushed some changes over. I think it fixes a few issues, but I'm not entirely sure it plugs everything in.
From: jason [...] ball.net
Hey mate, I saw the changes in the mainline, unfortunately still seeing a leek in the calls. I've attached a test that shows this. It looks like a number of the calls have this problem - the test focuses on zmq_recv. The attached test runs both the blocking and non blocking calls, it fails on the non blocking with a leaked SV. BTW - I'm away for a week attending LCA (Linux Conference Australia). I'll check emails and should be able to test changes during the week - but there will probably be delays. I'll also see if I can work it out -- trying to learn the XS stuff. From what I can see the current changes _should_ work... Cheers J.
Subject: rt64944.t
#!/usr/bin/perl use strict; use Test::More; use Devel::Leak; use Devel::Peek; use Scalar::Util qw(weaken); use Devel::Size; BEGIN { use_ok 'ZeroMQ'; use_ok 'ZeroMQ::Constants', qw(:all); } use lib '../lib'; use Monitor::Event; my $handle; subtest vanillaperl => sub { # # Prepare the test environment # my $ctx = new ZeroMQ::Context(); my $pub = $ctx->socket("ZMQ_PUB"); $pub->bind("tcp://*:9999"); my $sub = $ctx->socket("ZMQ_SUB"); $sub->connect("tcp://*:9999"); $sub->setsockopt(ZMQ_SUBSCRIBE, ''); $pub->send("Test message so the blocking recv actually gets something"); # # test blocking sockets # my $start = Devel::Leak::NoteSV($handle); #\&doTest_block($sub); my $end = Devel::Leak::NoteSV($handle); is($end, $start, "zmq_recv() SV Leak Check"); # # test non blocking sockets # $start = Devel::Leak::NoteSV($handle); \&doTest_noblock($sub); $end = Devel::Leak::NoteSV($handle); is($end, $start, "zmq_recv(ZMQ_NOBLOCK) SV Leak Check"); }; sub doTest_block { my($sub) = @_; my $value=$sub->recv(); }; sub doTest_noblock { my($sub) = @_; my $value=$sub->recv(ZMQ_NOBLOCK); }; done_testing;
I plugged a few more leaks. 1) For your RCVMORE problem, the problem is that when there are no messages available to be read, $! is set to the value of zeromq error status, and *that* was leaking. However, if you're getting a lot of "errors" - that is, occasions where you're calling recv() unsuccessfully, I suspect you're doing non-blocking reads wrong. See t/rt64944.t for a complete code using zmq_poll and AnyEvent to do non-blocking reads. 2) there were edge cases where zmq_msg_init*() family of functions would fail, and then the allocated structures weren't being released. those have been plugged as well. See if that changes anything for you. Also just FYI (and you may have discovered this by now) when you check valgrind results, look for Links_DefinitelyLost first. Then, look for possible occurrences of PerlZMQ_* functions in other areas - However usually those are not anything we can do about. Check if you're really losing memory while using the particular XS library, and then check back the valgrind results.
From: jason [...] ball.net
On Sat Jan 22 08:36:33 2011, DMAKI wrote: Show quoted text
> I plugged a few more leaks. > > 1) For your RCVMORE problem, the problem is that when there are no > messages available to > be read, $! is set to the value of zeromq error status, and *that* was > leaking. > > However, if you're getting a lot of "errors" - that is, occasions > where you're calling recv() > unsuccessfully, I suspect you're doing non-blocking reads wrong. See > t/rt64944.t for a > complete code using zmq_poll and AnyEvent to do non-blocking reads. > > 2) there were edge cases where zmq_msg_init*() family of functions > would fail, and then the > allocated structures weren't being released. those have been plugged > as well. > > See if that changes anything for you. > > > Also just FYI (and you may have discovered this by now) when you check > valgrind results, > look for Links_DefinitelyLost first. Then, look for possible > occurrences of PerlZMQ_* functions > in other areas - However usually those are not anything we can do > about. Check if you're > really losing memory while using the particular XS library, and then > check back the valgrind > results. >
I can confirm that this has fixed the leak I was seeing. Note that I haven't exhaustively checked the API. I will also convert over to zmq_poll and anyevent when I'm back from leave. Nicely done ;)
Cool, closing this ticket. I'll release 0.08