Skip Menu |

This queue is for tickets about the XML-Diff CPAN distribution.

Report information
The Basics
Id: 28609
Status: open
Priority: 0/
Queue: XML-Diff

People
Owner: Nobody in particular
Requestors: mm [...] artegic.de
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: re-use of instantiated XML-Diff 0.5
Date: Wed, 1 Aug 2007 14:22:27 +0200 (CEST)
To: bug-XML-Diff [...] rt.cpan.org
From: mm [...] artegic.de
Dear sirs, being quite happy with Your XML-Diff module, I do have nondeterministic problems when reusing an instantiated XML-Diff object very frequently. About my environment: dbnu053 rel72_ux ~> perl -v This is perl, v5.8.8 built for x86_64-linux-thread-multi dbnu053 rel72_ux ~> xml2-config --version 2.6.23 The OS is SLES 10. I am instantiating the module once and then feeding tens of thousands of XML documents into it to compare: sub diff_xml { my ( $self, $id ) = @_; my $xml = $self->{'xml'}; my $opts = $self->{'opts'}; $self->{'differ'} ||= XML::Diff->new(); my $differ = $self->{'differ'}; print "\n----------\nDiffing xmls for $id ... "; # Call differ my $res; $res = eval { $differ->compare( -old => $$xml{$id}{'wna'}, -new => $$xml{$id}{'ds'}, ); }; # delete( $self->{'differ'} ); return $@ if $@; return $res->toString( 1 ); } Now, if I do delete the differ object after each comparison, everything is fine. If I do reuse the instance, after 50-500 comparisons the first error will occur and from then 50-80% of the subsequent comparisons will fail with the following error: Can't call method "previousSibling" on an undefined value at <pathto>/usr/lib/perl5/site_perl/5.8.8/XML/Diff.pm line 1389, <FH> line <lineno>. This line contains a call to $node instantiated the line before, if I do dump the full $lookup->{id} table, the actual $id the node is supposed to be instantiated from is for some reason always 323 and the corresponding entry not present in the table, while the number of table entries ranges from 90 to 350 (probably connected to the number of nodes in the original xmls?). My guess would be that the differ object does not properly clean up memory after each run and/or abnormal termination upon returning $@. Due to time reasons I didn't have a chance to check whether 323 is simply the first node gone missing due to an occurrence of the error described below. Which leads to another problem I have (that I haven't time to investigate further yet): I do get $@ filled with "Invalid expression" as a return value from about 5% of the calls. Still all the xmls causing these errors are valid XML. Is this maybe related to bug #27725: "Tag with empty content error (<p></p>)"? Or is it maybe a system lib that's faulty? TIA, Yours Martin
Hi Martin, Thanks for your bug report. I've just taken over co-maintainership of this module as the original author has other interests these days. Looking at your problem I think it sounds likely that it's caused by perl internals (ref counting and g/c issues) so I'd be inclined to suggest that you work around it for now by maybe only using a single instance for a limited time and then destroying and re-making it. For example changing your code to something like this sub diff_xml { my ( $self, $id ) = @_; my $xml = $self->{'xml'}; my $opts = $self->{'opts'}; $self->{'differ'} = XML::Diff->new() if (not(defined($self->{differ})) or ++$self->{differused} % 50 == 0); will result in the "differ" object being used a maximum of 50 times, which should still get the performance that I think you're after but avoid the memory leak problem. I'll look at this problem when I get a little more comfortable with the code (I suspect cleaning out the object at each call to compare should help considerably) but in the short term I'd suggest the sort of workaround above is likely to be your safest option Cheers -- Tim