Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 41594
Status: resolved
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: henridamien.laurent [...] biblibre.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Segmentation fault at the very end of a script
Date: Wed, 10 Dec 2008 16:48:29 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: "henridamien.laurent" <henridamien.laurent [...] biblibre.com>
Hi I am working on different Linux machines, XML::Twig is quite impressive, but I happen to have a seg fault problem with a script of mine : It occurs at the very end of the script, so process is OK, but it still is quite uncomfortable. And I would like to find out why it is so. XML file is quite big : 12Mo But I think it should be OK. is there something I should know ? Some thing I should do to provide you with enough information ? #!/usr/bin/perl use XML::Twig; use XML::Twig::XPath; use Data::Dumper; use Getopt::Long; use utf8; use strict; # buffers for holding text my %concepthash; my %mthash; my %BT; my %NT; my ($filename,$force); GetOptions( 'file:s' => \$filename, 'f' => \$force, ); # initialize parser with handlers for node processing my $twig = new XML::Twig::XPath( TwigHandlers => { "/th/langue/record" => \&concepthash, "/th/langue/mt" => \&mthash, }); # parse, handling nodes on the way $twig->parsefile( $filename ); my %mtdiff; my @nodeset = $twig->get_xpath('/th/langue[@lang-id="fre"]/record'); my (%modelem, %createelem, %newelem, %unmodifiedelem); CONCEPT :foreach my $elem (@nodeset) { # Construction de l'enregistrement candidat my $id= $elem->att( 'id' ); #process process process ..... #uses hashes } my %BT; my %NT; # initialize parser with handlers for node processing my @nodeset = $twig->get_xpath('/th/langue[@lang-id!="fre"]/record'); foreach my $elem (@nodeset){ my $id= $elem->att( 'id' ); # Process Process Process #uses hashes } $twig->dispose; # handle a concept element to build the concepts hash. sub concepthash { my( $tree, $elem ) = @_; # Process Process Process #creates a concept hash } sub mthash { my( $tree, $elem ) = @_; # Process Process Process #creates an mt hash } -- Henri-Damien LAURENT BibLibre SARL http://www.biblibre.com Expert en Logiciels Libres pour l'info-doc tel : +33 4 67 65 75 50
Subject: Re: [rt.cpan.org #41594] Segmentation fault at the very end of a script
Date: Wed, 10 Dec 2008 17:34:01 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
A few things you could try: - not call dispose, it shouldn't be needed - try not using XML::Twig::XPath, I believe the XPath expression you use is supported by the native XPath-lite engine Which versions of perl and of XML::Twig are you using? Note that in quite a few cases dying is much faster than exiting the program properly (you don't spend time freeing lots of small bits of memory in an orderly fashion... just to free all of the process space as soon as you're done). Thanks -- mirod
Subject: Re: [rt.cpan.org #41594] Segmentation fault at the very end of a script
Date: Wed, 10 Dec 2008 17:50:18 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: "henridamien.laurent" <henridamien.laurent [...] biblibre.com>
xmltwig@gmail.com via RT a écrit : Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=41594 > > > A few things you could try: > > - not call dispose, it shouldn't be needed > - try not using XML::Twig::XPath, I believe the XPath expression you use is > supported by the native XPath-lite engine > > Which versions of perl and of XML::Twig are you using? > > Note that in quite a few cases dying is much faster than exiting the program > properly (you don't spend time freeing lots of small bits of memory in an > orderly fashion... just to free all of the process space as soon as you're done). > > Thanks > >
XML::Twig version is 3.32 Perl version is 5.10 but it occured with 5.8.8. I used dispose to get rid of this seg fault, without success. Can XML::Twig::XPath be the source of the problem ? -- Henri-Damien LAURENT BibLibre SARL http://www.biblibre.com Expert en Logiciels Libres pour l'info-doc tel : +33 4 67 65 75 50
Subject: Re: [rt.cpan.org #41594] Segmentation fault at the very end of a script
Date: Wed, 10 Dec 2008 18:53:11 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
henridamien.laurent via RT wrote: Show quoted text
> XML::Twig version is 3.32 > Perl version is 5.10 but it occured with 5.8.8.
Show quoted text
> I used dispose to get rid of this seg fault, without success.
I kinda suspected that ;--( Show quoted text
> Can XML::Twig::XPath be the source of the problem ?
I don't know, I am just trying to narrow down the problem. OK, could you send me a (small!) file that shows the problem, that would help. Thanks. -- mirod
Subject: Re: [rt.cpan.org #41594] Segmentation fault at the very end of a script
Date: Fri, 16 Jan 2009 18:17:08 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: LAURENT Henri-Damien <henridamien [...] koha-fr.org>
Data file is quite big. But otherwise : process as such : see int_test.pl I sent data along. launch perl int_test.pl data.xml Is there something I can doo about this segmentation fault problem ? It does not even process all the file data.xml.bz2 as I supposed. Thanks for you quick answer. -- Henri-Damien LAURENT
Download data.xml.bz2
application/x-bzip 605.5k

Message body not shown because it is not plain text.

#!/usr/bin/perl use XML::Twig; use XML::Twig::XPath; # use Unicode::String qw(utf8 latin1); use MARC::Record; use Data::Dumper; use Date::Manip; use Getopt::Long; use utf8; use strict; # buffers for holding text my %concepthash; my %mthash; my %BT; my %NT; my ($filename,$force); GetOptions( 'file:s' => \$filename, 'f' => \$force, ); # initialize parser with handlers for node processing my $twig = new XML::Twig::XPath( TwigHandlers => { "/thesaurusMultilingue/langue/thesaurus/concept" => \&concepthash, "/thesaurusMultilingue/langue/thesaurus/microthesaurus" => \&mthash, }); # parse, handling nodes on the way $twig->parsefile( $filename ); # use Data::Dumper; warn "base : ".$dbh->{Name}; #Construction du hachage de concept. #Il faut le faire AVANT de proceder a la reconnaissance des relations. my %mtdiff; my @nodeset = $twig->get_xpath('/thesaurusMultilingue/langue[@lang-id="fre"]/thesaurus/concept'); my (%modelem, %createelem, %newelem, %unmodifiedelem); CONCEPT :foreach my $elem (@nodeset) { my $id= $elem->att( 'id' ); my $create = $elem->trimmed_field( 'dateCreation' ); my $modify = $elem->trimmed_field( 'dateModification' ); $create =~s/-//g; $modify =~s/-//g; my $form= $elem->trimmed_field( 'term' ); my $note = $elem->trimmed_field( 'noteApplication' ); #search for an existing record in thesaurus my @relations = $elem->descendants('relation'); my $nok; my %tagfields=( "UF"=>'450', "RT"=>'550', "BT"=>'550', "NT"=>'550', ); my %relations=( "BT"=>"g", "NT"=>"h", ); my @fields; foreach my $relation (@relations){ if ($relation->att('type')=~/MT/){ $concepthash{$id}->{'MT'} = $relation->att('ref'); } elsif ($relation->att('type') =~/UF|RT|BT|NT/){ push @fields,($tagfields{$relation->att('type')}, "2"=>"".$mthash{$concepthash{$relation->att( 'ref' )}->{'MT'}}->{'fre'}, "3"=>$relation->att( 'ref' ), "a"=>$concepthash{$relation->att( 'ref' )}->{'fre'}); } elsif ($relation->att('type') =~ /USE/){ next CONCEPT; } } ################################### Fin Construction record autorit� # Ajouter l'arbre ici my $trees=GetParents($id); my $hierarchies; $hierarchies=join (';',map{join(',',@{$_})} @$trees); $modelem{"$id"}=$form; my $hierarchies; $hierarchies=join (';',map{join(',',@$_)} @$trees); print "$id:$hierarchies", join("\n\t",@fields),"\n\n"; } sub GetParents{ my ($id)= @_; if (defined $concepthash{$id}->{'parents'}){ my @parents=map{my $ancestors=GetParents($_); map{[@{$_},$id]} @$ancestors} keys %{$concepthash{$id}->{'parents'}}; return \@parents; } else { return [[$id]]; } } # handle a concept element to build the concepts hash. sub concepthash { my( $tree, $elem ) = @_; # utf8::decode($elem->trimmed_field( 'term' )); $concepthash{$elem->att( 'id' )}->{$elem->parent('langue')->att( 'lang-id' )}=$elem->trimmed_field( 'term' ); if ($elem->first_descendant('relation[@type="MT"]')){ $concepthash{$elem->att( 'id' )}->{'MT'} = $elem->first_descendant('relation[@type="MT"]')->att('ref') ; } else { warn $elem->att( 'id' )." Pas de microthésaurus attaché"; } map { $concepthash{$elem->att( 'id' )}->{'parents'}->{$_->att('ref')}=1} $elem->descendants('relation[@type="BT"]') if ($elem->descendants('relation[@type="BT"]')); } sub mthash { my( $tree, $elem ) = @_; # utf8::decode($elem->trimmed_field( 'mt-name' )); $mthash{$elem->att( 'mt-id' )}->{$elem->parent('langue')->att( 'lang-id' )}=$elem->trimmed_field( 'mt-name' ); }
Subject: Re: [rt.cpan.org #41594] Segmentation fault at the very end of a script
Date: Sat, 17 Jan 2009 10:51:58 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
LAURENT Henri-Damien via RT wrote: Show quoted text
> Queue: XML-Twig > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=41594 > > > Data file is quite big. > But otherwise : process as such : > see int_test.pl > I sent data along. > launch perl int_test.pl data.xml > Is there something I can doo about this segmentation fault problem ? > It does not even process all the file data.xml.bz2 as I supposed. > Thanks for you quick answer.
Well, the data is just too big. A simple test like perl -MXML::Twig -e'XML::Twig->new->parsefile( "data.xml");' would have shown it (and saved me some time, as I assumed you had tested that, and sent me the simplest test that generated the error). Do you need to load the entire tree in memory, or could you purge it, maybe after each concept, I can't tell from a quick look at your code? -- mirod
Subject: Re: [rt.cpan.org #41594] Segmentation fault at the very end of a script
Date: Mon, 19 Jan 2009 10:21:03 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: LAURENT Henri-Damien <henridamien.laurent [...] biblibre.com>
xmltwig@gmail.com via RT a écrit : Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=41594 > > > LAURENT Henri-Damien via RT wrote: >
>> Queue: XML-Twig >> Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=41594 > >> >> Data file is quite big. >> But otherwise : process as such : >> see int_test.pl >> I sent data along. >> launch perl int_test.pl data.xml >> Is there something I can doo about this segmentation fault problem ? >> It does not even process all the file data.xml.bz2 as I supposed. >> Thanks for you quick answer. >>
> > Well, the data is just too big. A simple test like perl -MXML::Twig > -e'XML::Twig->new->parsefile( "data.xml");' would have shown it (and saved me > some time, as I assumed you had tested that, and sent me the simplest test that > generated the error). >
MMM... Sorry for not failing to test this. I just couldnot imagine it was failing parsing the file, since it partly achieved the job. And I believed XML::Twig was designed to process big files. It seems that XML::LibXML can parse this file. Show quoted text
> Do you need to load the entire tree in memory, or could you purge it, maybe > after each concept, I can't tell from a quick look at your code? >
In my opinion, if that file was big for a mail, it is not so big for process. I have to first read the whole file to find relations between nodes and to determine the type of node. The problem is that for each node, I need elements from related nodes and those nodes can be further in the data file, so not processed yet. If I found a solution to do this, would there be a simple way to adapt my code so that after memory is freed after each concept ?* -- Henri-Damien LAURENT