Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 31945
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: jhoger [...] pobox.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: xs, thread related refcount issue
Date: Thu, 27 Dec 2007 16:13:47 -0800
To: bug-XML-LibXML [...] rt.cpan.org
From: "John R. Hogerhuis" <jhoger [...] pobox.com>
Running this code (contrived, my real-world case is more complex, hopefully it is the same problem): ################## #!/usr/bin/env perl use threads; use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string ('<AnyXML></AnyXML>'); sub bgrd_task { my $parser = XML::LibXML->new(); my $doc = $parser->parse_string ('<AnyXML></AnyXML>'); } my $bgrd = threads->new (\&bgrd_task); $bgrd->join(); ################ Results in this error, and a segfault: ################ john@sqalab1:~/projects/libxml_test$ ./test.pl PmmREFCNT_dec: REFCNT decremented below 0!. PmmREFCNT_dec: REFCNT decremented below 0!. Segmentation fault ################ john@sqalab1:~/projects/libxml_test$ perl -v This is perl, v5.8.7 built for i486-linux-gnu-thread-multi (with 1 registered patch, see perl -V for more detail) Copyright 1987-2005, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using `man perl' or `perldoc perl'. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. john@sqalab1:~/projects/libxml_test$ john@sqalab1:~/projects/libxml_test$ uname -a Linux sqalab1 2.6.15-23-386 #1 PREEMPT Tue May 23 13:49:40 UTC 2006 i686 GNU/Linux john@sqalab1:~/projects/libxml_test$ admin tells me we're running latest version of XML::LibXML (XML-LibXML-1.65)
From: TIMBRODY [...] cpan.org
I've written a small patch that should prevent the seg faults on clean up when spawning threads after creating LibXML objects. IIUC this isn't the way Perl threading is supposed to work (objects should be copied on thread spawns), so I would be very careful if you intend using any object in more than one thread. Tim.
Index: perl-libxml-mm.c =================================================================== --- perl-libxml-mm.c (revision 695) +++ perl-libxml-mm.c (working copy) @@ -111,6 +111,7 @@ struct _ProxyNode { xmlNodePtr node; xmlNodePtr owner; + SV * thread; int count; int encoding; }; @@ -129,6 +130,7 @@ #define PmmREFCNT_inc(node) node->count++ #define PmmNODE(thenode) thenode->node #define PmmOWNER(node) node->owner +#define PmmTHREAD(node) node->thread #define PmmOWNERPO(node) ((node && PmmOWNER(node)) ? (ProxyNodePtr)PmmOWNER(node)->_private : node) #define PmmENCODING(node) node->encoding @@ -153,6 +155,7 @@ if (proxy != NULL) { proxy->node = node; proxy->owner = NULL; + proxy->thread = aTHX; proxy->count = 0; proxy->encoding= 0; node->_private = (void*) proxy; Index: LibXML.xs =================================================================== --- LibXML.xs (revision 695) +++ LibXML.xs (working copy) @@ -3711,9 +3711,13 @@ void DESTROY( node ) SV * node + PREINIT: + SV * thread; CODE: + thread = aTHX; // aTHX is the current perl interpreter xs_warn("DESTROY PERL NODE\n"); - PmmREFCNT_dec(SvPROXYNODE(node)); + if( PmmTHREAD(SvPROXYNODE(node)) == thread ) + PmmREFCNT_dec(SvPROXYNODE(node)); SV* nodeName( self ) Index: perl-libxml-mm.h =================================================================== --- perl-libxml-mm.h (revision 695) +++ perl-libxml-mm.h (working copy) @@ -48,6 +48,7 @@ struct _ProxyNode { xmlNodePtr node; xmlNodePtr owner; + SV * thread; int count; int encoding; }; @@ -66,6 +67,7 @@ #define PmmREFCNT_inc(node) node->count++ #define PmmNODE(xnode) xnode->node #define PmmOWNER(node) node->owner +#define PmmTHREAD(node) node->thread #define PmmOWNERPO(node) ((node && PmmOWNER(node)) ? (ProxyNodePtr)PmmOWNER(node)->_private : node) #define PmmENCODING(node) node->encoding Index: t/90threads.t =================================================================== --- t/90threads.t (revision 695) +++ t/90threads.t (working copy) @@ -2,7 +2,7 @@ use Config; use constant MAX_THREADS => 10; use constant MAX_LOOP => 50; -use constant PLAN => 14; +use constant PLAN => 15; BEGIN { plan tests => PLAN; if( $Config{useithreads} ) { @@ -38,6 +38,17 @@ <root><node><leaf/></node></root> EOF +# Spawn threads with a document in scope +{ +my $doc = $p->parse_string( $xml ); +for(1..MAX_THREADS) +{ + threads->new(sub {}); +} +$_->join for(threads->list); +} +ok(1); + # Parse a correct XML document { for(1..MAX_THREADS)
From: TIMBRODY [...] cpan.org
After a bit more thought here's a different approach. This will handle the case where a thread uses an object that has gone out scope of the thread that created it (added test case to t/90threads.t). This is possibly closer to how threads support is "supposed" to happen, but doesn't copy the XS objects on clone. Performance-wise I doubt it's worth maintaining anything more complex than a linked list? Memory-wise this adds one pointer per active Perl LibXML node.
Index: perl-libxml-mm.c =================================================================== --- perl-libxml-mm.c (revision 695) +++ perl-libxml-mm.c (working copy) @@ -101,6 +101,7 @@ * (in case of document fragments, they are not the same!) * @count: this is the internal reference count! * @encoding: this value is missing in libxml2's doc structure + * @_registry: used to build the proxy node registry * * Since XML::LibXML will not know, is a certain node is already * defined in the perl layer, it can't shurely tell when a node can be @@ -113,6 +114,7 @@ xmlNodePtr owner; int count; int encoding; + struct _ProxyNode * _registry; }; /* helper type for the proxy structure */ @@ -134,6 +136,79 @@ #define PmmENCODING(node) node->encoding #define PmmNodeEncoding(node) ((ProxyNodePtr)(node->_private))->encoding #define PmmDocEncoding(node) (node->charset) + +/* + * registry of all current proxy nodes + */ +ProxyNodePtr PROXY_NODE_REGISTRY = NULL; + +/* + * @proxy: proxy node to register + * + * adds a proxy node to the proxy node registry + */ +void +PmmRegisterProxyNode(ProxyNodePtr proxy) +{ + proxy->_registry = PROXY_NODE_REGISTRY; + PROXY_NODE_REGISTRY = proxy; +} + +/* + * @proxy: proxy node to remove + * + * removes a proxy node from the proxy node registry + */ +void +PmmUnregisterProxyNode(ProxyNodePtr proxy) +{ + ProxyNodePtr cur = PROXY_NODE_REGISTRY; + if( PROXY_NODE_REGISTRY == proxy ) { + PROXY_NODE_REGISTRY = proxy->_registry; + } + else { + while(cur->_registry != NULL) + { + if( cur->_registry == proxy ) + { + cur->_registry = proxy->_registry; + break; + } + cur = cur->_registry; + } + } +} + +/* + * increments all proxy node counters by one (called on thread spawn) + */ +void +PmmCloneProxyNodes() +{ + ProxyNodePtr cur = PROXY_NODE_REGISTRY; + while(cur != NULL) + { + PmmREFCNT_inc(cur); + cur = cur->_registry; + } +} + +/* + * returns the current number of proxy nodes in the registry + */ +int +PmmProxyNodeRegistrySize() +{ + int i = 0; + ProxyNodePtr cur = PROXY_NODE_REGISTRY; + while(cur != NULL) + { + ++i; + cur = cur->_registry; + } + return i; +} + /* creates a new proxy node from a given node. this function is aware * about the fact that a node may already has a proxy structure. */ @@ -155,7 +230,9 @@ proxy->owner = NULL; proxy->count = 0; proxy->encoding= 0; + proxy->_registry = NULL; node->_private = (void*) proxy; + PmmRegisterProxyNode(proxy); } } else { @@ -283,6 +360,7 @@ PmmFreeNode( libnode ); } + PmmUnregisterProxyNode(node); Safefree( node ); /* free( node ); */ } Index: LibXML.xs =================================================================== --- LibXML.xs (revision 695) +++ LibXML.xs (working copy) @@ -1272,6 +1272,19 @@ xmlInitializeCatalog(); /* use catalog data */ #endif +void +_CLONE( class ) + char * class + CODE: + PmmCloneProxyNodes(); + +int +_leaked_nodes() + CODE: + RETVAL = PmmProxyNodeRegistrySize(); + OUTPUT: + RETVAL + char * LIBXML_DOTTED_VERSION() CODE: Index: perl-libxml-mm.h =================================================================== --- perl-libxml-mm.h (revision 695) +++ perl-libxml-mm.h (working copy) @@ -50,6 +50,7 @@ xmlNodePtr owner; int count; int encoding; + struct _ProxyNode * _registry; }; /* helper type for the proxy structure */ Index: LibXML.pm =================================================================== --- LibXML.pm (revision 695) +++ LibXML.pm (working copy) @@ -152,6 +152,14 @@ } #-------------------------------------------------------------------------# +# Threads support methods # +#-------------------------------------------------------------------------# + +# threads doc says CLONE's API may change in future, which would break +# an XS method prototype +sub CLONE { XML::LibXML::_CLONE( $_[0] ) } + +#-------------------------------------------------------------------------# # DOM Level 2 document constructor # #-------------------------------------------------------------------------# Index: t/80registryleak.t =================================================================== --- t/80registryleak.t (revision 0) +++ t/80registryleak.t (revision 0) @@ -0,0 +1,23 @@ +use Test; +BEGIN { plan tests => 3} +END { ok(0) unless $loaded } +use XML::LibXML; + +$loaded = 1; +ok(1); + +my $p = XML::LibXML->new(); +ok($p); + +my $xml = <<EOX; +<?xml version="1.0"?> +<root><child/></root> +EOX + +{ +my $doc = $p->parse_string($xml); +my $root = $doc->documentElement; +my $child = $root->firstChild; +} + +ok(&XML::LibXML::_leaked_nodes == 0); Index: t/90threads.t =================================================================== --- t/90threads.t (revision 695) +++ t/90threads.t (working copy) @@ -2,12 +2,13 @@ use Config; use constant MAX_THREADS => 10; use constant MAX_LOOP => 50; -use constant PLAN => 14; +use constant PLAN => 16; BEGIN { plan tests => PLAN; if( $Config{useithreads} ) { if ($ENV{THREAD_TEST}) { require threads;; + require threads::shared; } else { skip("optional (set THREAD_TEST=1 to run these tests)\n") for (1..PLAN); exit; @@ -38,6 +39,33 @@ <root><node><leaf/></node></root> EOF +# Spawn threads with a document in scope +{ +my $doc = $p->parse_string( $xml ); +for(1..MAX_THREADS) +{ + threads->new(sub {}); +} +$_->join for(threads->list); +} +ok(1); + +# Spawn threads that use document that has gone out of scope from where it was +# created +{ +my $waitfor : shared; +{ +lock $waitfor; +my $doc = $p->parse_string($xml); +for(1..MAX_THREADS) +{ + threads->new(sub { lock $waitfor; $doc->toString; }); +} +} +$_->join for(threads->list); +ok(1); +} + # Parse a correct XML document { for(1..MAX_THREADS)
Tim, thanks a lot! I'm about to commit the patch. I hope O(n) unregistering won't be too great an overhead. What do you think about this idea: Enable the reference counting only if 'use threads' is in called before 'use XML::LibXML', i.e. if $threads::threads == 1 when the XML::LibXML's INIT is called? Then XML::LibXML could set up a flag that enables this reference counting; if $threads::threads is off, then the flag will be off and the reference counting won't be used (with the benefit of avoiding the unregistering overhead). Just a thought, Cheers, -- Petr
From: jhoger [...] pobox.com
On Tue Jan 08 06:56:21 2008, TIMBRODY wrote: Show quoted text
> Performance-wise I doubt it's worth maintaining anything more complex > than a linked list? Memory-wise this adds one pointer per active Perl > LibXML node.
Thanks for coming up with a fix, Tim! I can't say I understand xs, but based on your explanation, does this mean an O(n) lookup per every node in the document when a LibXML object goes out of scope? That seems like it would add up. A O(1) (hash, or cache the pointer in a handy place) operation would probably be worth it for larger docs or lots of transient docs. But if it's not really a lookup then it wouldn't matter.
From: pajas [...] matfyz.cz
Show quoted text
> I can't say I understand xs, but based on your explanation, does this > mean an O(n) lookup per every node in the document when a LibXML
object Show quoted text
> goes out of scope? That seems like it would add up. A O(1) (hash, or > cache the pointer in a handy place) operation would probably be worth
it Show quoted text
> for larger docs or lots of transient docs.
Not for every document node but for every node for which there is some reference from Perl still in the scope. I.e. if you do, say, $doc = $parser->parse_file('foo.xml'); { my $el = XML::LibXML::Element->new('bar'); } then when $el goes out of scope, a list of (up to) two nodes (consisting of the document node and element node referred to in $doc and $el respectivelly) is traversed (although there may be thousands of nodes in the document $doc). Also, if things go in this particular order, $el is at the top of the list, so in fact it is removed in O(1). But otherwise I agree, maybe a Perl hash would be worth trying. Maybe benchmarks (e.g. of the regression tests) with no thread support, with a list and with a hash would be nice. -- Petr
From: jhoger [...] pobox.com
On Wed Jan 09 03:49:06 2008, PAJAS wrote: Show quoted text
> Not for every document node but for every node for which there is some > reference from Perl still in the scope. I.e. if you do, say, > > $doc = $parser->parse_file('foo.xml'); > { my $el = XML::LibXML::Element->new('bar'); } > > then when $el goes out of scope, a list of (up to) two nodes > (consisting of the document node and element node referred to in $doc > and $el respectivelly) is traversed (although there may be thousands of > nodes in the document $doc). Also, if things go in this particular > order, $el is at the top of the list, so in fact it is removed in O(1). >
Much less bad than if it happened for all nodes. Show quoted text
> But otherwise I agree, maybe a Perl hash would be worth trying. Maybe > benchmarks (e.g. of the regression tests) with no thread support, with > a list and with a hash would be nice. >
Well, given your description of the list impact, it seems like the patch makes sense to apply as-is. Certainly better than segfaults and double free messages. If the lookup causes any pain for those using threads and lots of XML documents they (may end up being me) will have an itch to scratch. Probably the experimental nature of the threading support should be mentioned in the POD though. The current situation at least warns of issues by printing errors. If it is fixed then there won't be any indication, but the documentation could make up for that. Please apply -- John.
applied amd ported to XML-LibXSLT