Skip Menu |

This queue is for tickets about the REST-Neo4p CPAN distribution.

Report information
The Basics
Id: 92797
Status: resolved
Priority: 0/
Queue: REST-Neo4p

People
Owner: maj.fortinbras [...] gmail.com
Requestors: stesin [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.2233
Fixed in: 0.2254



Subject: Failure after 3 * 681 queries in a row
Dear Mark, the problem looks like this. I got a sample file with 10000 triplets of ( $lastname, $firstname, $middlename ) and some digit with each (say frequency). Obviously, neither of three are unique, they are spreaded around the sample set. Now I want to make 3 sets (labels) FirstName, LastName and MiddleName where each node is unique. Ok I wrote a dumb and straightforward script which takes a triplet, prettyprints names, foreach name first asks "is a node with it here already?" and if not - creates a node. Both queries return just COUNT(node), 1 row of output. I'll send you sample data and script by mail. When issueing a query, I check for error. No errors. But after some 681 triplets script fails - it fails, returning '' rows instead of 1. ??? Diagnostics: 681 34 <Луценко><Володимир><Іванович>Qry <<MATCH (n:LastName { name_ua: 'Луценко' } ) RETURN COUNT(n)>> executed, expected 1 row got <<>>, stopped at Dict_BIG_loader.pl line 98, <> line 684. This is case 1. Ok I thought, and just replaced die for printing $numrows of result. This is case 2. At 681 triplet, it starts returning zeros instead of 1's for $numrows. No error diagnostics! that's the problem. Ok I though and surrounded the piece of junkcode with transaction (case 3). Got 2 rows of output (how? why?) and message "Can't parse query response (unexpected token looking for next row)" when trying to fetch() that second row. Would you mind looking at this, please? With best regards, Andrii p.s. Email with sample data and scripts follows.
Hi Andrii- Ok, I fixed a couple of issues regarding the error reporting and the numrows of zero, these are in 0.2241 (just uploaded). The queries stopping after some fixed, not very large number-- this appears to be a server issue. The server actually stops responding (the new error handling should show that now). I got it with your data and also data I generated. It's something worth raising with Neo4j, I think I will do that. However, there is a very effective workaround I think. If you create your queries one time, but use parameters, and then execute the queries in the loop with parameter values, you can get the script to go to completion. Here you execute() the same query object many times. I think this will give you better per-query performance as well. Attached is your script modified to do this. It ran well for me. Please let me know how it goes- Mark On Fri Feb 07 02:34:17 2014, stesin@gmail.com wrote: Show quoted text
> Dear Mark, > the problem looks like this. I got a sample file with 10000 triplets > of ( $lastname, $firstname, $middlename ) and some digit with each > (say frequency). Obviously, neither of three are unique, they are > spreaded around the sample set. Now I want to make 3 sets (labels) > FirstName, LastName and MiddleName where each node is unique. Ok I > wrote a dumb and straightforward script which takes a triplet, > prettyprints names, foreach name first asks "is a node with it here > already?" and if not - creates a node. Both queries return just > COUNT(node), 1 row of output. I'll send you sample data and script by > mail. > When issueing a query, I check for error. No errors. But after some > 681 triplets script fails - it fails, returning '' rows instead of 1. > ??? > Diagnostics: 681 34 <Луценко><Володимир><Іванович>Qry <<MATCH > (n:LastName { name_ua: 'Луценко' } ) RETURN COUNT(n)>> executed, > expected 1 row got <<>>, stopped at Dict_BIG_loader.pl line 98, <> > line 684. > This is case 1. > Ok I thought, and just replaced die for printing $numrows of result. > This is case 2. At 681 triplet, it starts returning zeros instead of > 1's for $numrows. No error diagnostics! that's the problem. > Ok I though and surrounded the piece of junkcode with transaction > (case 3). Got 2 rows of output (how? why?) and message "Can't parse > query response (unexpected token looking for next row)" when trying to > fetch() that second row. > Would you mind looking at this, please? > With best regards, > Andrii > p.s. Email with sample data and scripts follows.
Subject: Dict_BIG_loader.pl
$src_fullnames_file = "Dict_FullNames_UA_Top2x10000.txt"; $src_FirstName1_file = "Dict_FirstNames_UA_2x100.txt"; $neo4j_url = 'http://127.0.0.1:7474'; use utf8; use Encode; use POSIX; use IO::Handle; use DateTime; use DateTime::Locale; use DateTime::Format::MSSQL; use Text::Autoformat; use Text::Wrap; $Text::Wrap::columns = 144; use REST::Neo4p; system( "chcp 1251" ); close STDERR; open STDERR, '>:encoding(cp1251)', 'CON:' or die "Open STDERR failed: " . $! . "\n"; close STDOUT; open STDOUT, '>:encoding(cp1251)', 'PostMortem_Queries.txt' or die "Open STDOUT failed: " . $! . "\n"; REST::Neo4p->connect($neo4j_url) or die "Neo4j connect failed: " . $! . "\n"; $version = REST::Neo4p->neo4j_version; print STDERR "Neo4j v." . $version . " ready to serve.\n" ; close STDIN; open STDIN, '<:encoding(utf8)', $src_fullnames_file or die "Open STDIN $src_fullnames_file failed: " . $! . "\n"; # # skip 2 header rows # <>; <>; $count = 0; # create single instances of Query objects with parameters my %queries; for (qw/FirstName MiddleName LastName/) { $queries{check}{$_} = REST::Neo4p::Query->new("MATCH (n:$_) WHERE n.name_ua = {name} RETURN COUNT(n)"); $queries{create}{$_} = REST::Neo4p::Query->new("CREATE (n:$_) SET n.name_ua = {name} RETURN COUNT(n)"); $queries{freq}{$_} = REST::Neo4p::Query->new("MATCH (f:$_) WHERE f.name_ua = {name} SET f.rating = {rating} RETURN f.rating"); } # only 9 total query objects while( <> ) { $count++; s/^\s+//; chomp; s/\s+/ /g; ( $no, $fullname, $qq, $dummy0, $dummy1 ) = split "\\|" ; chomp $no, $fullname, $qq; $fullname =~ s/^\s+//; $fullname =~ s/\s+$//; ( $lastname, $firstname, $midname ) = split( /\s+/, $fullname ); $lastname = autoformat( $lastname, { case => 'highlight' } ); $lastname =~ s/\s+$//; $firstname = autoformat( $firstname, { case => 'highlight' } ); $firstname =~ s/\s+$//; $midname = autoformat( $midname, { case => 'highlight' } ); $midname =~ s/\s+$//; printf STDERR "%8d %8d<%20s><%20s><%20s>", $count, $qq, $lastname, $firstname, $midname; store_name( $lastname, 'LastName' ); store_name( $firstname, 'FirstName' ); store_name( $midname, 'MiddleName' ); print STDERR "\n"; } close STDIN; open STDIN, '<:encoding(utf8)', $src_FirstName1_file or die "Open STDIN $src_fullnames_file failed: " . $! . "\n"; # # skip 2 header rows # <>; <>; $count = 0; while( <> ) { $count++; s/^\s+//; chomp; s/\s+/ /g; ( $no, $firstname, $qq, $dummy0, $dummy1 ) = split "\\|" ; chomp $no, $firstname, $qq; $firstname =~ s/^\s+//; $firstname =~ s/\s+$//; $firstname = autoformat( $firstname, { case => 'highlight' } ); $firstname =~ s/\s+$//; printf STDERR "%8d %8d<%20s>", $count, $qq, $firstname; store_name( $firstname, 'FirstName' ); undef $numrows; # $store_freq_qry = "MATCH (f:FirstName { name_ua: '$firstname' } SET f.rating = $qq RETURN f.rating"; # $qry_store_freq = REST::Neo4p::Query->new( $store_freq_qry ) or die "Failed to compile $store_freq_qry, stopped"; # $numrows = $qry_store_freq->execute(); $numrows = $queries{freq}{FirstName}->execute( name => $firstname, rating => $qq ); if( $queries{freq}{FirstName}->err() ) { die "Qry <<".$queries{freq}{FirstName}->{Statement}.">> execution failed, code " . $qry_store_freq->err() . ", errmsg " . $qry_store_freq->errstr() . ", stopped"; } if( $numrows != 1 ) { die "Qry <<".$queries{freq}{FirstName}->{Statement}.">> executed, expected 1 row got <<" . $numrows . ">>, stopped"; }; while( $numrows ) { $quant = $qry_store_freq->fetch->[0]; --$numrows; } print STDERR "\n"; } exit 0; sub store_name { # name, label my $name = shift; my $label = shift; undef $numrows; undef $quant; # $check_name_qry = "MATCH (n:$label { name_ua: '$name' } ) RETURN COUNT(n)"; # $qry_check_name = REST::Neo4p::Query->new( $check_name_qry ) or die "Failed to compile $check_name_qry, stopped"; # print wrap( '', '', $check_lastname_qry, "\n" ); print "\n"; #-case 3- REST::Neo4p->begin_work(); # $numrows = $qry_check_name->execute(); $numrows = $queries{check}{$label}->execute( name => $name ); if( $queries{check}{$label}->err() ) { die "Qry <<".$queries{check}{$label}->{Statement}.">> execution failed, code " . $qry_check_name->err() . ", errmsg " . $queries{check}{$label}->errstr() . ", stopped"; } #-case 1- #if( $numrows != 1 ) { die "Qry <<$check_name_qry>> executed, expected 1 row got <<" . $numrows . ">>, stopped"; }; #-case 2- if( $numrows != 1 ) { printf STDERR "_ %3d", $numrows; } while( $numrows ) { $quant = $queries{check}{$label}->fetch->[0]; --$numrows; } if( $quant == 0 ) { # $create_name_qry = "CREATE (n:$label { name_ua: '$name' } ) RETURN COUNT(n)"; # $qry_create_name = REST::Neo4p::Query->new( $create_name_qry ) or die "Failed to compile $create_name_qry, stopped"; $numrows = $queries{create}{$label}->execute(name => $name); if( $queries{create}{$label}->err() ) { die "Qry <<".$queries{create}{$label}->{Statement}.">> execution failed, code " . $queries{create}{$label}->err() . ", errmsg " . $queries{create}{$label}->errstr() . ", stopped"; } #-case 1- #if( $numrows != 1 ) { die "Qry <<$create_name_qry>> executed, expected 1 row got " . $numrows . ", stopped"; }; #-case 2- if( $numrows != 1 ) { printf STDERR "! %3d", $numrows; } while( $numrows ) { $quant = $queries{create}{$label}->fetch->[0]; --$numrows; } } if ( $quant > 1 ) { die "More than 1 nodes '$name' :$label found, stopped"; } #-case 3- REST::Neo4p->commit(); return $quant; }
Andrii -- I believe the actual problem underlying this issue should be fixed since v0.2254, and the multiple indivudal queries should work into the many thousands now- MAJ