On Thu Dec 19 01:07:29 2013, symonsjo@gmail.com wrote:
Show quoted text> Hi Mark,
>
> In my latest code I'm using param queries to take advantage of neo4j
> query caching. That code is here:
http://pastebin.com/mS2F5iJh
>
> I'll upgrade REST::Neo4p and integrate $query->finish() into my
> destroy_query method from the code above (and drop undef $query).
>
> On Wed Dec 18 20:53:36 2013, MAJENSEN wrote:
> > oops - a bug in the sample below, should be
> >
> > while ( ($id,$lev) = @{pop @incident_and_category_params} )
> >
> > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:
> > > Ok, I will think a bit harder about this.
> > >
> > > In the latest version (0.2222), there are a couple of things that
> > > might help. If you have thousands of files open, then presumably
> > > there
> > > are thousands of live query objects (without destroy_query). That's
> > > probably my bad at some level. Not sure why the files are not
> > > cleaned
> > > up when $neo4p_query is reassigned to a new query. Old object
> > > should
> > > go away at that point.
> > >
> > > However, in the latest version, execute() can take parameter
> > > assignments as arguments, and there is now (as you suggested
> > > before) a
> > > finish() method that will delete the tmp file but leave the object
> > > intact. So you could prepare the queries before you start the loop,
> > > and use finish at the end of the loop:
> > >
> > > $incident_query =
> > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)'
> > > . ' WHERE i.incident_id = { incident_id }'
> > > . ' AND ic.category_level_01 = { category_level_01 }'
> > > . ' CREATE UNIQUE'
> > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r';
> > > $incident_query_obj = REST::Neo4p::Query->new($incident_query);
> > > # suppose an array of 2-elt arrays with desired parm combinations:
> > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} ) {
> > > $incident_query_obj->execute( incident_id => $id,
> > > category_level_01 => $lev );
> > > while (my $row = $incident_query_obj->fetch) {
> > > ...
> > > last if $GOT_WHAT_I_WANT;
> > > }
> > >
> > > $incident_query_obj->finish();
> > > }
> > >
> > > Each time you call finish, the tmp file is deleted, and each time
> > > you
> > > call execute a tmp file is created, but all within one query
> > > instance.
> > > You can set up multiple such query instances outside the loop and
> > > use
> > > the parameter bindings within the loop. Should make the code more
> > > maintainable too.
> > >
> > > Please try and let me know. I REALLY appreciate this feedback!
> > > best MAJ
> > >
> > >
> > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:
> > > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:
> > > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:
> > > > > > Calling this resolved; please reopen if necessary. thanks
> > > > >
> > > > > I don't think this is resolved, in fact without destroy_query
> > > > > there's
> > > > > a whole lot more files open at once, the below result is after
> > > > > loading
> > > > > only 2500 rows.
> > > > >
> > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say
> > > > > $REST::Neo4p::VERSION'
> > > > > 0.126
> > > > > [jsymons@larva nrls]$ ps -ef | grep extract
> > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl
> > > > > -w
> > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file
> > > > > RA3_All.xlsx
> > > > > --neo_uri=
http://localhost:7474
> > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract
> > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}'
> > > > > 38071
> > > > >
> > > > > And once I killed the script at between 3000 & 3500 rows, in
> > > > > tmp
> > > > > current there were:
> > > > >
> > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$'
> > > > > 46498
> > > > >
> > > > > Latest code with destory_query & updated for 2.0.0 stable at
> > > > >
http://pastebin.com/v3xUJFeP
> > > >
> > > >
> > > > ^without destroy_query
> >
> >
Hello Mark,
I upgraded to 0.2222 as suggested:
[jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say $REST::Neo4p::VERSION'
0.2222
And I added $query->finish() to the destroy_query sub and removed the undef $query in the code in pastebin here:
http://pastebin.com/mS2F5iJh
But after loading 500 rows from the spreadsheet without the undef in place I end up with 7045 open files:
[jsymons@larva nrls]$ /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx --neo_uri=
http://localhost:7474 2> RA3_All.xlsx.loading.error.log | tee RA3_All.xlsx.loading.log
Processed 500 rows. Thu Dec 19 06:18:46 2013
[jsymons@larva nrls]$ ps -ef | grep extract
jsymons 5268 19391 79 06:16 pts/11 00:00:05 /usr/bin/perl -w /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx --neo_uri=
http://localhost:7474
jsymons 5271 20114 0 06:16 pts/13 00:00:00 grep extract
[jsymons@larva nrls]$ while true ; do echo "/tmp has " $(lsof -p 5268 | grep -c '/tmp/.\{10\}') " open files for pid" ; sleep 10 ; done
/tmp has 1086 open files for pid
/tmp has 1612 open files for pid
/tmp has 2122 open files for pid
/tmp has 2634 open files for pid
/tmp has 3201 open files for pid
/tmp has 3783 open files for pid
/tmp has 4317 open files for pid
/tmp has 4838 open files for pid
/tmp has 5387 open files for pid
/tmp has 5933 open files for pid
/tmp has 6479 open files for pid
/tmp has 7045 open files for pid
Actually even with undef on the newer code it's still worse than it was with 0.126 as far as open files go because I also tried with $query-finish(); AND undef $query; and still had a huge amount of open files. Sub is below:
sub destroy_query {
my $query = shift;
if (defined($query)) {
while (my $response = $query->fetch()) {
# Who cares, throw it out
}
$query->finish(); # Cleanup suggested by Mark
undef $query;
}
}