Subject: | "add" method randomly fails to fire on Win32 |
Date: | Wed, 22 Sep 2010 01:51:22 -0700 |
To: | bug-Thread-Pool-Simple [...] rt.cpan.org |
From: | Aaron Wirtz <me [...] awirtz.com> |
Here is a sample program which demonstrates the issue. I have only observed this issue on Win32 platforms. (Debian and Darwin appear unaffected) I theorize that this is due to the low quality of the random number generator on Win32 causing duplicate job ids to be a not uncommon occurrence, which triggers this bug.
use Thread::Pool::Simple;
open my $logfile, '>', 'threadlog.txt';
my $pool = Simple->new(
min => 1, max => 1,
do => [sub { syswrite $logfile, "tid: ".threads->tid()." i: $_[0]\n"; }],
);
for my $i (1..1000) { $pool->add($i); }
$pool->join();
Observe that on Win32 the logfile will usually have fewer than 1000 lines in it. I was able to track this down to a bug in the "add" method. This is what the source in Thread::Pool::Simple v0.24 looks like:
sub add {
my $self = shift;
my $context = wantarray;
$context = 2 unless defined $context; # void context = 2
my $arg = nfreeze(\@_);
my $id;
while (1) {
$id = int(rand(time()));
next unless $id;
++$id unless $context == $id % 3;
++$id unless $context == $id % 3;
{
lock %{$self->{submitted}};
next if $self->job_exists($id);
{
# this is necessary as some cancelled jobs may slip in
lock %{$self->{done}};
delete $self->{done}{$id};
}
$self->{pending}->enqueue(pack('Na*', $id, $arg));
$self->{submitted}{$id} = 1;
}
last;
}
return $id;
}
Specifically, a random job id is generated, and before it is put into use, the existing list of job ids is checked to ensure it is not a duplicate. If there is a collision, the loop is reset with a "next" and a fresh attempt made to generate a random job id. Unfortunately, this "next" call is inside a second bracketed block (used to lexically scope a variable lock) so it is interpreted as "exit this inner block" not as "exit this iteration of the outer while loop". This could be remedied either by providing a label on the "while" loop and making the "next" explicitly refer to the label, or by changing the inner block to something that does not affect the "next" statement but still provides lexical scoping for the variable lock. Since I am not so much a fan of labels, I chose the latter approach and fixed it as follows:
sub add {
my $self = shift;
my $context = wantarray;
$context = 2 unless defined $context; # void context = 2
my $arg = nfreeze(\@_);
my $id;
while (1) {
$id = int(rand(time()));
next unless $id;
++$id unless $context == $id % 3;
++$id unless $context == $id % 3;
do {
lock %{$self->{submitted}};
next if $self->job_exists($id);
{
# this is necessary as some cancelled jobs may slip in
lock %{$self->{done}};
delete $self->{done}{$id};
}
$self->{pending}->enqueue(pack('Na*', $id, $arg));
$self->{submitted}{$id} = 1;
};
last;
}
return $id;
}
In my testing, this has resolved all the issues on Win32 systems. Hopefully Jianyuan Wu is still around and can push this fix into the upstream source - thanks for the great module!
-Aaron