Skip Menu |

This queue is for tickets about the ZeroMQ CPAN distribution.

Report information
The Basics
Id: 74653
Status: open
Priority: 0/
Queue: ZeroMQ

People
Owner: Nobody in particular
Requestors: mark [...] blackmans.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.20
Fixed in: (no value)



Subject: segmentation fault using send
Using gdb I get the following backtrace when using a script that has a problem with zmq send after about 1800 message sends. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x0000000136373831 0x0000000100001d47 in Perl_malloc () (gdb) bt #0 0x0000000100001d47 in Perl_malloc () #1 0x0000000100002eaf in Perl_calloc () #2 0x00000001017b6098 in XS_ZeroMQ__Raw_zmq_send (cv=0x104aa3fd0) at perl_zeromq.xs:537 #3 0x00000001000874ae in Perl_pp_entersub () #4 0x000000010007f456 in Perl_runops_standard () #5 0x000000010007bc35 in perl_run () #6 0x00000001000010bc in main () This ZeroMQ 0.20 with Perl 5.10.0 on OS X SnowLeopard. The perl was built with Perl's malloc. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=darwin, osvers=10.8.0, archname=darwin-2level uname='darwin markimac.fairfx.local 10.8.0 darwin kernel version 10.8.0: tue jun 7 16:33:36 pdt 2011; root:xnu-1504.15.3~1release_i386 i386 ' config_args='-de -Dprefix=/Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0 - Dusemymalloc -Accflags=-DPERL_DEBUGGING_MSTATS' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=y, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DPERL_DEBUGGING_MSTATS -fno- strict-aliasing -pipe -I/usr/local/include -I/opt/local/include', optimize='-O3', cppflags='-fno-common -DPERL_DARWIN -DPERL_DEBUGGING_MSTATS -fno-strict- aliasing -pipe -I/usr/local/include -I/opt/local/include' ccversion='', gccversion='4.2.1 (Apple Inc. build 5666) (dot 3)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -L/usr/local/lib -L/opt/local/ lib' libpth=/usr/local/lib /opt/local/lib /usr/lib libs=-lgdbm -ldbm -ldl -lm -lutil -lc perllibs=-ldl -lm -lutil -lc libc=, so=dylib, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -L/opt/ local/lib' Characteristics of this binary (from libperl): Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_PERLIO Built under darwin Compiled at Dec 15 2011 11:07:41 %ENV: PERL5LIB="FX/lib:/Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5/darwin-2level:/ Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5" PERLBREW_HOME="/Volumes/cs/MBlackman/.perlbrew" PERLBREW_PATH="/Volumes/cs/MBlackman/perl5/perlbrew/bin:/Volumes/cs/MBlackman/ perl5/perlbrew/perls/perl-5.10.0/bin" PERLBREW_PERL="perl-5.10.0" PERLBREW_ROOT="/Volumes/cs/MBlackman/perl5/perlbrew" PERLBREW_VERSION="0.29" PERL_LOCAL_LIB_ROOT="/Volumes/cs/MBlackman/perl5/profiles/main" PERL_MB_OPT="--install_base /Volumes/cs/MBlackman/perl5/profiles/main" PERL_MM_OPT="INSTALL_BASE=/Volumes/cs/MBlackman/perl5/profiles/main" @INC: FX/lib /Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5/darwin-2level /Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5/darwin-2level /Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5 /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0/lib/5.10.0/darwin-2level /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0/lib/5.10.0 /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0/lib/site_perl/5.10.0/ darwin-2level /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0/lib/site_perl/5.10.0
From: mark [...] blackmans.org
While the seg. fault is within Perl_malloc, I've got a feeling this is how a memory leak is being manifested, as malloc is hitting some ulimit size limits and segfaulting. I've not looked hard, but perhaps "send" is allocating objects that it's not releasing? The code in question is just a simple loop around send like so. my $ident=0; while (1){ print STDERR "sending ".$ident++,"\n"; $sender->send($ident); } Usually around iteration 2000, we get the seg. fault. Increasing the ulimit stack size has sometimes but not always allowed us to get to iteration 17000 On Fri Feb 03 05:03:04 2012, mark@blackmans.org wrote: Show quoted text
> Using gdb I get the following backtrace when using a script that has a > problem with zmq send after about 1800 message sends. > > > Program received signal EXC_BAD_ACCESS, Could not access memory. > Reason: KERN_INVALID_ADDRESS at address: 0x0000000136373831 > 0x0000000100001d47 in Perl_malloc () > (gdb) bt > #0 0x0000000100001d47 in Perl_malloc () > #1 0x0000000100002eaf in Perl_calloc () > #2 0x00000001017b6098 in XS_ZeroMQ__Raw_zmq_send (cv=0x104aa3fd0) at > perl_zeromq.xs:537 > #3 0x00000001000874ae in Perl_pp_entersub () > #4 0x000000010007f456 in Perl_runops_standard () > #5 0x000000010007bc35 in perl_run () > #6 0x00000001000010bc in main () > > This ZeroMQ 0.20 with Perl 5.10.0 on OS X SnowLeopard. The perl was > built with Perl's malloc. > > Summary of my perl5 (revision 5 version 10 subversion 0) > configuration: > Platform: > osname=darwin, osvers=10.8.0, archname=darwin-2level > uname='darwin markimac.fairfx.local 10.8.0 darwin kernel version > 10.8.0: tue jun 7 > 16:33:36 pdt 2011; root:xnu-1504.15.3~1release_i386 i386 ' > config_args='-de > -Dprefix=/Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0 - > Dusemymalloc -Accflags=-DPERL_DEBUGGING_MSTATS' > hint=recommended, useposix=true, d_sigaction=define > useithreads=undef, usemultiplicity=undef > useperlio=define, d_sfio=undef, uselargefiles=define, > usesocks=undef > use64bitint=define, use64bitall=define, uselongdouble=undef > usemymalloc=y, bincompat5005=undef > Compiler: > cc='cc', ccflags ='-fno-common -DPERL_DARWIN > -DPERL_DEBUGGING_MSTATS -fno- > strict-aliasing -pipe -I/usr/local/include -I/opt/local/include', > optimize='-O3', > cppflags='-fno-common -DPERL_DARWIN -DPERL_DEBUGGING_MSTATS > -fno-strict- > aliasing -pipe -I/usr/local/include -I/opt/local/include' > ccversion='', gccversion='4.2.1 (Apple Inc. build 5666) (dot 3)', > gccosandvers='' > intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 > d_longlong=define, longlongsize=8, d_longdbl=define, > longdblsize=16 > ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', > lseeksize=8 > alignbytes=8, prototype=define > Linker and Libraries: > ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' > -L/usr/local/lib -L/opt/local/ > lib' > libpth=/usr/local/lib /opt/local/lib /usr/lib > libs=-lgdbm -ldbm -ldl -lm -lutil -lc > perllibs=-ldl -lm -lutil -lc > libc=, so=dylib, useshrplib=false, libperl=libperl.a > gnulibc_version='' > Dynamic Linking: > dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' > cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup > -L/usr/local/lib -L/opt/ > local/lib' > > > Characteristics of this binary (from libperl): > Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV > PERL_MALLOC_WRAP > USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES > USE_PERLIO > Built under darwin > Compiled at Dec 15 2011 11:07:41 > %ENV: > PERL5LIB="FX/lib:/Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5/darwin- > 2level:/ > Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5" > PERLBREW_HOME="/Volumes/cs/MBlackman/.perlbrew" > PERLBREW_PATH="/Volumes/cs/MBlackman/perl5/perlbrew/bin:/Volumes/cs/
MBlackman/ Show quoted text
> perl5/perlbrew/perls/perl-5.10.0/bin" > PERLBREW_PERL="perl-5.10.0" > PERLBREW_ROOT="/Volumes/cs/MBlackman/perl5/perlbrew" > PERLBREW_VERSION="0.29" > PERL_LOCAL_LIB_ROOT="/Volumes/cs/MBlackman/perl5/profiles/main" > PERL_MB_OPT="--install_base > /Volumes/cs/MBlackman/perl5/profiles/main" > PERL_MM_OPT="INSTALL_BASE=/Volumes/cs/MBlackman/perl5/profiles/main" > @INC: > FX/lib > /Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5/darwin-2level > /Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5/darwin-2level > /Volumes/cs/MBlackman/perl5/profiles/main/lib/perl5 > /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl- > 5.10.0/lib/5.10.0/darwin-2level > /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl-5.10.0/lib/5.10.0 > /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl- > 5.10.0/lib/site_perl/5.10.0/ > darwin-2level > /Volumes/cs/MBlackman/perl5/perlbrew/perls/perl- > 5.10.0/lib/site_perl/5.10.0 >
On 2012-2月-03 金 05:20:17, mark@blackmans.org wrote: Show quoted text
> While the seg. fault is within Perl_malloc, I've got a feeling this is > how a memory leak is being > manifested, as malloc is hitting some ulimit size limits and > segfaulting. > > I've not looked hard, but perhaps "send" is allocating objects that > it's not releasing? > > The code in question is just a simple loop around send like so. > > > my $ident=0; > while (1){ > print STDERR "sending ".$ident++,"\n"; > $sender->send($ident); > } > > Usually around iteration 2000, we get the seg. fault. Increasing the > ulimit stack size has > sometimes but not always allowed us to get to iteration 17000 >
Is the above code all that's required to reproduce the seg fault? Can you make it a test case? If you can make it into a reliable test case, I'd love to see it as a pull req to http://github.com/lestrrat/ZeroMQ-Perl
From: mark [...] blackmans.org
On Fri Feb 03 06:36:36 2012, DMAKI wrote: Show quoted text
> Is the above code all that's required to reproduce the seg fault?
Not quite, there's a bit of 0MQ initialization work to do as well and and some receiver code. I've attached both sides as scripts. The point at which a segfault happens seems a bit variable but always within 20k iterations, so far. Not quite sure how to make this a test case that catches a perl segfault, unless the work is spawned externally. Show quoted text
> Can you make it a test case? > > If you can make it into a reliable test case, I'd love to see it as a > pull req to > http://github.com/lestrrat/ZeroMQ-Perl
Subject: vent_zmq.pl
#!/usr/bin/env perl =pod Task ventilator Binds PUSH socket to tcp://localhost:5557 Sends batch of tasks to workers via that socket Author: Alexander D'Archangel (darksuji) <darksuji(at)gmail(dot)com> =cut use strict; use warnings; use 5.10.0; use ZeroMQ qw/:all/; sub within { my ($upper) = @_; return int(rand($upper)) + 1; } my $context = ZeroMQ::Context->new(); # Socket to send messages on my $sender = $context->socket(ZMQ_PUSH); $sender->bind('tcp://*:5557'); print 'Press Enter when the workers are ready: '; <STDIN>; say 'Sending tasks to workers '; # The first message is "0" and signals start of batch #$sender->send('0'); my $ident=0; while (1){ print STDERR "sending ".$ident++,"\n"; $sender->send($ident); } print STDERR "done sending\n"; sleep(1); # Give 0MQ time to deliver
Subject: worker_zmq.pl
#!/usr/bin/env perl =pod Task worker Connects PULL socket to tcp://localhost:5557 Collects workloads from ventilator via that socket Connects PUSH socket to tcp://localhost:5558 Sends results to sink via that socket Author: Alexander D'Archangel (darksuji) <darksuji(at)gmail(dot)com> =cut use strict; use warnings; use 5.10.0; use IO::Handle; use ZeroMQ qw/:all/; use Time::HiRes qw/nanosleep/; use English qw/-no_match_vars/; use constant NSECS_PER_MSEC => 1000000; my $nchild = 20; my @pids; foreach my $n (1 .. $nchild) { my $pid = fork(); # if child start the worker, otherwise start another if (!$pid) { ############## WORKER CODE START ################# my $context = ZeroMQ::Context->new(); # Socket to receive messages on my $receiver = $context->socket(ZMQ_PULL); $receiver->connect('tcp://localhost:5557'); # Socket to send messages to my $sender = $context->socket(ZMQ_PUSH); $sender->connect('tcp://localhost:5558'); # Process tasks forever while (1) { my $string = $receiver->recv()->data; my $time = $string * NSECS_PER_MSEC; # Simple progress indicator for the viewer STDOUT->printflush("$string."); # Send results to sink $sender->send(''); } exit(0); #exit the child ############## WORKER CODE END ################# } # this is executed only in parent say "started $pid\n"; push(@pids, $pid); } while (wait > 0){ say "child exited" }; #wait for all children to die # send a message back to the "ventilator" saying we're all done
Show quoted text
> Not quite, there's a bit of 0MQ initialization work to do as well and > and some receiver code. I've attached both sides as scripts. > The point at which a segfault happens seems a bit variable > but always within 20k iterations, so far. > > Not quite sure how to make this a test case that catches a perl > segfault, unless > the work is spawned externally.
Attached is what distilled from your script. Does this cause a segfault? FYI, on my mac with perl-5.14.2 non-threaded, this works fine.
Subject: rt74653.t
use strict; use Test::More; use Test::TCP; use ZeroMQ qw(:all); my $MAX_MESSAGES = 2_500; my $server = Test::TCP->new(code => sub { my $port = shift; my $context = ZeroMQ::Context->new(); my $sender = $context->socket(ZMQ_PUSH); $sender->bind("tcp://*:$port"); # XXX hacky synchronization sleep 3; # The first message is "0" and signals start of batch #$sender->send('0'); my $ident=0; while ($ident < $MAX_MESSAGES) { note "sending ".$ident++,"\n"; $sender->send($ident); } note "Done sending"; sleep(1); # Give 0MQ time to deliver }); { my $context = ZeroMQ::Context->new(); # Socket to receive messages on my $receiver = $context->socket(ZMQ_PULL); $receiver->connect("tcp://localhost:" . $server->port); for my $expected (1..$MAX_MESSAGES) { my $msg = $receiver->recv(); is $msg->data, $expected; } } undef $server; done_testing;
From: mark [...] blackmans.org
Looks like perl-5.10.0 OS X Lion no perlmalloc also exhibits a problem. 501 15698 3004 0 31 0 2439348 8392 - S+ s002 0:00.52 /Users/mark/perl5/ perlbrew/perls/perl-5.10.0/bin/perl -w /Users/mark/perl5/perlbrew/perls/perl-5.10.0/bin/ prove -v rt74653.t 501 15699 15698 0 31 0 2448976 8568 - S+ s002 0:09.48 /Users/mark/perl5/ perlbrew/perls/perl-5.10.0/bin/perl rt74653.t 501 15700 15699 0 0 0 0 0 - Z+ s002 0:00.00 (perl) 501 15829 15498 0 31 0 2434892 532 - S+ s003 0:00.00 grep perl Died around iteration 763 and just hung (as you can see the Zombie process that was the feeder thread presumably) I also get a segfault when I run my script pair. I've seen a few positive results on other perl versions, so it feels like a perl 5.10.0 allocation bug being exercised. But I'll try 5.10.1 next and confirm the segfault goes away there. On Fri Feb 03 17:12:58 2012, DMAKI wrote: Show quoted text
>
> > Not quite, there's a bit of 0MQ initialization work to do as well and > > and some receiver code. I've attached both sides as scripts. > > The point at which a segfault happens seems a bit variable > > but always within 20k iterations, so far. > > > > Not quite sure how to make this a test case that catches a perl > > segfault, unless > > the work is spawned externally.
> > Attached is what distilled from your script. Does this cause a segfault? > FYI, on my mac with perl-5.14.2 non-threaded, this works fine.
From: mark [...] blackmans.org
Running the test scripts on OS X Snow Leopard, perl 5.10.0 with system malloc results in *NO* segmentation fault, which is a bit inconsistent with the result indicated below as it's hard to imagine that OS X Lion would result in this kind of regression. I think I'll have to retest Lion with system malloc. In any case, this is not a ZeroMQ bug I'm sure, so I'll treat it as resolved. On Sat Feb 04 10:40:55 2012, mark@blackmans.org wrote: Show quoted text
> Looks like perl-5.10.0 OS X Lion no perlmalloc also exhibits a > problem. > > 501 15698 3004 0 31 0 2439348 8392 - S+ s002 > 0:00.52 /Users/mark/perl5/ > perlbrew/perls/perl-5.10.0/bin/perl -w > /Users/mark/perl5/perlbrew/perls/perl-5.10.0/bin/ > prove -v rt74653.t > 501 15699 15698 0 31 0 2448976 8568 - S+ s002 > 0:09.48 /Users/mark/perl5/ > perlbrew/perls/perl-5.10.0/bin/perl rt74653.t > 501 15700 15699 0 0 0 0 0 - Z+ s002 > 0:00.00 (perl) > 501 15829 15498 0 31 0 2434892 532 - S+ s003 > 0:00.00 grep perl > > Died around iteration 763 and just hung (as you can see the Zombie > process that was the > feeder thread presumably) > > I also get a segfault when I run my script pair. I've seen a few > positive results on other perl > versions, so it feels like a perl 5.10.0 allocation bug being > exercised. But I'll try 5.10.1 next > and confirm the segfault goes away there. > > On Fri Feb 03 17:12:58 2012, DMAKI wrote:
> >
> > > Not quite, there's a bit of 0MQ initialization work to do as well
> and
> > > and some receiver code. I've attached both sides as scripts. > > > The point at which a segfault happens seems a bit variable > > > but always within 20k iterations, so far. > > > > > > Not quite sure how to make this a test case that catches a perl > > > segfault, unless > > > the work is spawned externally.
> > > > Attached is what distilled from your script. Does this cause a
> segfault?
> > FYI, on my mac with perl-5.14.2 non-threaded, this works fine.
>
From: mark [...] blackmans.org
I've confirmed that the original report on Lion must have been mistaken and perl 5.10.0 with system malloc on Lion doesn't result in the segmentation fault and I'm concluding this bug is exclusive to perl malloc in 5.10.0 at least. I've not tested later versions with perl malloc. On Mon Feb 06 06:20:28 2012, mark@blackmans.org wrote: Show quoted text
> Running the test scripts on OS X Snow Leopard, perl 5.10.0 with system > malloc results in *NO* segmentation fault, which is a bit inconsistent > with > the result indicated below as it's hard to imagine that OS X Lion > would > result in this kind of regression. > > I think I'll have to retest Lion with system malloc. > > In any case, this is not a ZeroMQ bug I'm sure, so I'll treat it as > resolved. > > > On Sat Feb 04 10:40:55 2012, mark@blackmans.org wrote:
> > Looks like perl-5.10.0 OS X Lion no perlmalloc also exhibits a > > problem. > > > > 501 15698 3004 0 31 0 2439348 8392 - S+ s002 > > 0:00.52 /Users/mark/perl5/ > > perlbrew/perls/perl-5.10.0/bin/perl -w > > /Users/mark/perl5/perlbrew/perls/perl-5.10.0/bin/ > > prove -v rt74653.t > > 501 15699 15698 0 31 0 2448976 8568 - S+ s002 > > 0:09.48 /Users/mark/perl5/ > > perlbrew/perls/perl-5.10.0/bin/perl rt74653.t > > 501 15700 15699 0 0 0 0 0 - Z+ s002 > > 0:00.00 (perl) > > 501 15829 15498 0 31 0 2434892 532 - S+ s003 > > 0:00.00 grep perl > > > > Died around iteration 763 and just hung (as you can see the Zombie > > process that was the > > feeder thread presumably) > > > > I also get a segfault when I run my script pair. I've seen a few > > positive results on other perl > > versions, so it feels like a perl 5.10.0 allocation bug being > > exercised. But I'll try 5.10.1 next > > and confirm the segfault goes away there. > > > > On Fri Feb 03 17:12:58 2012, DMAKI wrote:
> > >
> > > > Not quite, there's a bit of 0MQ initialization work to do as
> well
> > and
> > > > and some receiver code. I've attached both sides as scripts. > > > > The point at which a segfault happens seems a bit variable > > > > but always within 20k iterations, so far. > > > > > > > > Not quite sure how to make this a test case that catches a perl > > > > segfault, unless > > > > the work is spawned externally.
> > > > > > Attached is what distilled from your script. Does this cause a
> > segfault?
> > > FYI, on my mac with perl-5.14.2 non-threaded, this works fine.
> >
> >