Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 7786
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: jgmyers [...] proofpoint.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.36
Fixed in: (no value)



Subject: Thread safety bug
The attached test case crashes perl. Crashes are much more frequent on multiprocessor boxes. A stack trace from a debugging malloc shows an invalid pointer being indirectly freed by free_pstate(). Running the test case with the -c switch appears not to crash perl, so it appears to have something to do with the code reference.
# test program to show core dump when try/catch is used with "our" vars use strict; use warnings; use threads; use threads::shared; use Error qw(:try); use Getopt::Long; use Time::HiRes qw(gettimeofday tv_interval); use HTML::Parser; my $crash = 1; my $loops = 30000; my $start; my $threads = 10; my $lock:shared; Getopt::Long::GetOptions('clean|c' => sub { $crash = 0; }, 'loops|l=i' => \$loops, 'threads|t=i' => \$threads); { print "Version ". $HTML::Parser::VERSION."\n"; lock($lock); for (my $i = 0; $i < $threads; $i++) { threads->new(\&threadfunc); } $start = [gettimeofday()]; } # Loop through all the threads waiting until finished foreach my $thr (threads->list) { # Don't join the main thread or ourselves if ($thr->tid() && !threads::equal($thr, threads->self())) { $thr->join(); } } print tv_interval($start) . " seconds\n"; sub threadfunc() { { lock($lock); } for (my $i=0 ; $i < $loops ; $i++) { my @result; my $out; print "$i loops\n" if ($i % 5000 == 0); if ($crash) { my $html = HTML::Parser->new(api_version => 3, handlers => [text => [sub { $out .= $_[0] if ($_[0]); }, "dtext"] ]); } else { my $html = HTML::Parser->new(api_version => 3, handlers => [text => [\@result, "dtext"] ]); } } }
From: jgmyers [...] proofpoint.com
This patch fixes one thread safety bug, but not the one causing the previous test case.
--- hparser.c 2004-09-27 19:01:40.000000000 -0700 +++ ../HTML-Parser-3.36-work/hparser.c 2004-09-27 18:57:13.000000000 -0700 @@ -638,7 +638,6 @@ if (isHNAME_FIRST(*s) || *s == '@') { char *name = s; int a = ARG_SELF; - char temp; char **arg_name; s++; @@ -646,10 +645,9 @@ s++; /* check identifier */ - temp = *s; - *s = '\0'; for ( arg_name = argname; a < ARG_LITERAL ; ++a, ++arg_name ) { - if (strEQ(*arg_name, name)) + if (strnEQ(*arg_name, name, s - name) && + (*arg_name)[s - name] == '\0') break; } if (a < ARG_LITERAL) { @@ -667,9 +665,8 @@ } } else { - croak("Unrecognized identifier %s in argspec", name); + croak("Unrecognized identifier %.*s in argspec", s - name, name); } - *s = temp; } else if (*s == '"' || *s == '\'') { char *string_beg = s;
From: jgmyers [...] proofpoint.com
The original test case demonstrates a problem either in the test case or in perl 5.8.5 ithreads. It is fixed by wrapping an eval around the anonymous subroutine: my $html = HTML::Parser->new(api_version => 3, handlers => {text => [eval 'sub { $out .= $_[0] if ($_[0]); }', "dtext"] }); Without the extra eval, the threads share the same code fragment and modify its reference count without locking. The attached patch does appear to fix a case where HTML::Parser is modifying a value that should be read-only.