Bug #61557 for Thread-Pool-Simple: "add" method randomly fails to fire on Win32

Wed Sep 22 04:51:36 2010 me [...] awirtz.com - Ticket created

Subject:	"add" method randomly fails to fire on Win32
Date:	Wed, 22 Sep 2010 01:51:22 -0700
To:	bug-Thread-Pool-Simple [...] rt.cpan.org
From:	Aaron Wirtz <me [...] awirtz.com>

Here is a sample program which demonstrates the issue. I have only observed this issue on Win32 platforms. (Debian and Darwin appear unaffected) I theorize that this is due to the low quality of the random number generator on Win32 causing duplicate job ids to be a not uncommon occurrence, which triggers this bug. use Thread::Pool::Simple; open my $logfile, '>', 'threadlog.txt'; my $pool = Simple->new( min => 1, max => 1, do => [sub { syswrite $logfile, "tid: ".threads->tid()." i: $_[0]\n"; }], ); for my $i (1..1000) { $pool->add($i); } $pool->join(); Observe that on Win32 the logfile will usually have fewer than 1000 lines in it. I was able to track this down to a bug in the "add" method. This is what the source in Thread::Pool::Simple v0.24 looks like: sub add { my $self = shift; my $context = wantarray; $context = 2 unless defined $context; # void context = 2 my $arg = nfreeze(\@_); my $id; while (1) { $id = int(rand(time())); next unless $id; ++$id unless $context == $id % 3; ++$id unless $context == $id % 3; { lock %{$self->{submitted}}; next if $self->job_exists($id); { # this is necessary as some cancelled jobs may slip in lock %{$self->{done}}; delete $self->{done}{$id}; } $self->{pending}->enqueue(pack('Na*', $id, $arg)); $self->{submitted}{$id} = 1; } last; } return $id; } Specifically, a random job id is generated, and before it is put into use, the existing list of job ids is checked to ensure it is not a duplicate. If there is a collision, the loop is reset with a "next" and a fresh attempt made to generate a random job id. Unfortunately, this "next" call is inside a second bracketed block (used to lexically scope a variable lock) so it is interpreted as "exit this inner block" not as "exit this iteration of the outer while loop". This could be remedied either by providing a label on the "while" loop and making the "next" explicitly refer to the label, or by changing the inner block to something that does not affect the "next" statement but still provides lexical scoping for the variable lock. Since I am not so much a fan of labels, I chose the latter approach and fixed it as follows: sub add { my $self = shift; my $context = wantarray; $context = 2 unless defined $context; # void context = 2 my $arg = nfreeze(\@_); my $id; while (1) { $id = int(rand(time())); next unless $id; ++$id unless $context == $id % 3; ++$id unless $context == $id % 3; do { lock %{$self->{submitted}}; next if $self->job_exists($id); { # this is necessary as some cancelled jobs may slip in lock %{$self->{done}}; delete $self->{done}{$id}; } $self->{pending}->enqueue(pack('Na*', $id, $arg)); $self->{submitted}{$id} = 1; }; last; } return $id; } In my testing, this has resolved all the issues on Win32 systems. Hopefully Jianyuan Wu is still around and can push this fix into the upstream source - thanks for the great module! -Aaron

Thu Sep 23 04:26:52 2010 jwu [...] cpan.org - Correspondence added

Aaron, Thanks for reporting and providing a fix for the bug. I have update the module on CPAN to include your fix. Thanks J.

Thu Sep 23 04:26:53 2010 The RT System itself - Status changed from 'new' to 'open'

Thu Sep 23 04:26:53 2010 jwu [...] cpan.org - Status changed from 'open' to 'resolved'