Bug #89478 for REST-Neo4p: Create final cleanup in REST::Neo4p->disconnect() to remove tmp files

Mon Oct 14 07:58:51 2013 symonsjo [...] gmail.com - Ticket created

Subject:

Create final cleanup in REST::Neo4p->disconnect() to remove tmp files

Currently files are created in /tmp when using REST::Neo4p::Query to parse JSON content. Although I can cleanup a lot of these files by exhausting the ->fetch on the query and undef'g the query, there are some files which still remain. It would be nice to implement a disconnect for REST::Neo4p which would be capable of cleaning up any remaining files.

Mon Oct 14 08:00:10 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Mon Oct 14 07:58:51 2013, symonsjo@gmail.com wrote: Show quoted text

> Currently files are created in /tmp when using REST::Neo4p::Query to > parse JSON content. Although I can cleanup a lot of these files by > exhausting the ->fetch on the query and undef'g the query, there are > some files which still remain. It would be nice to implement a > disconnect for REST::Neo4p which would be capable of cleaning up any > remaining files.

An example of a current script I'm using can be found here: http://pastebin.com/CD9CEhnY

Mon Oct 14 08:11:48 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Mon Oct 14 08:00:10 2013, symonsjo@gmail.com wrote: Show quoted text

> On Mon Oct 14 07:58:51 2013, symonsjo@gmail.com wrote:

> > Currently files are created in /tmp when using REST::Neo4p::Query to > > parse JSON content. Although I can cleanup a lot of these files by > > exhausting the ->fetch on the query and undef'g the query, there are > > some files which still remain. It would be nice to implement a > > disconnect for REST::Neo4p which would be capable of cleaning up any > > remaining files.

> > An example of a current script I'm using can be found here: > http://pastebin.com/CD9CEhnY

Actually, since letting the program finish naturally cleans up once REST::Neo4p is collected it would be nice just to set a max file limit on the connection to keep it from overflowing, and/or override /tmp as a location for storing the files.

Mon Oct 14 13:24:36 2013 maj.fortinbras [...] gmail.com - Taken

Wed Oct 16 23:40:13 2013 maj.fortinbras [...] gmail.com - Correspondence added

On Mon Oct 14 08:11:48 2013, symonsjo@gmail.com wrote: Show quoted text

> On Mon Oct 14 08:00:10 2013, symonsjo@gmail.com wrote:

> > On Mon Oct 14 07:58:51 2013, symonsjo@gmail.com wrote:

> > > Currently files are created in /tmp when using REST::Neo4p::Query > > > to > > > parse JSON content. Although I can cleanup a lot of these files by > > > exhausting the ->fetch on the query and undef'g the query, there > > > are > > > some files which still remain. It would be nice to implement a > > > disconnect for REST::Neo4p which would be capable of cleaning up > > > any > > > remaining files.

> > > > An example of a current script I'm using can be found here: > > http://pastebin.com/CD9CEhnY

> > Actually, since letting the program finish naturally cleans up once > REST::Neo4p is collected it would be nice just to set a max file limit > on the connection to keep it from overflowing, and/or override /tmp as > a location for storing the files.

Hi, thanks very much for pointing this issue out. You're having to do way too much; cleanup should just work. I've uploaded v0.2110; please give this a try. If you would, try this version with your script without invoking your destroy_query and see if the tmp files are removed. MAJ

Wed Oct 16 23:40:13 2013 The RT System itself - Status changed from 'new' to 'open'

Tue Nov 05 23:42:29 2013 maj.fortinbras [...] gmail.com - Correspondence added

Calling this resolved; please reopen if necessary. thanks

Tue Nov 05 23:42:29 2013 maj.fortinbras [...] gmail.com - Status changed from 'open' to 'resolved'

Tue Nov 05 23:42:29 2013 maj.fortinbras [...] gmail.com - Fixed in 0.2111 added

Wed Dec 18 15:57:01 2013 symonsjo [...] gmail.com - Status changed from 'resolved' to 'open'

Wed Dec 18 16:06:10 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Tue Nov 05 23:42:29 2013, MAJENSEN wrote: Show quoted text

> Calling this resolved; please reopen if necessary. thanks

I don't think this is resolved, in fact without destroy_query there's a whole lot more files open at once, the below result is after loading only 2500 rows. [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say $REST::Neo4p::VERSION' 0.126 [jsymons@larva nrls]$ ps -ef | grep extract jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' 38071 And once I killed the script at between 3000 & 3500 rows, in tmp current there were: [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' 46498 Latest code with destory_query & updated for 2.0.0 stable at http://pastebin.com/v3xUJFeP

Wed Dec 18 16:12:09 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote: Show quoted text

> On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > Calling this resolved; please reopen if necessary. thanks

> > I don't think this is resolved, in fact without destroy_query there's > a whole lot more files open at once, the below result is after loading > only 2500 rows. > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > $REST::Neo4p::VERSION' > 0.126 > [jsymons@larva nrls]$ ps -ef | grep extract > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx > --neo_uri=http://localhost:7474 > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > 38071 > > And once I killed the script at between 3000 & 3500 rows, in tmp > current there were: > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > 46498 > > Latest code with destory_query & updated for 2.0.0 stable at > http://pastebin.com/v3xUJFeP

Just FYI, with this version info I call destroy_query the files are open only equal to the amount of rows loaded (this is using the same code with the destroy_query intact), they are also unlinked from /tmp. After 59000 rows loading: [jsymons@miyu nrls]$ ps -ef | grep extract jsymons 3356 26122 0 21:08 pts/5 00:00:00 grep extract jsymons 29042 963 73 13:47 pts/3 05:26:07 /usr/bin/perl -w /home/jsymons/extract_nrls.M06.pl --input_file RCB_All.xlsx --neo_uri=http://localhost:7474 [jsymons@miyu nrls]$ lsof -p 29042 | grep -c '/tmp/.\{10\}' 59245 [jsymons@miyu nrls]$ perl -MREST::Neo4p -E 'say $REST::Neo4p::VERSION' 0.2101 [jsymons@miyu nrls]$ ls /tmp | grep -c '.\{10\}$' 59404 P.S. sorry it took so long to get back to you I was busy performance tuning Neo4j and upgrading.

Wed Dec 18 16:13:43 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Wed Dec 18 16:12:09 2013, symonsjo@gmail.com wrote: Show quoted text

> On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > Calling this resolved; please reopen if necessary. thanks

> > > > I don't think this is resolved, in fact without destroy_query there's > > a whole lot more files open at once, the below result is after > > loading > > only 2500 rows. > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > $REST::Neo4p::VERSION' > > 0.126 > > [jsymons@larva nrls]$ ps -ef | grep extract > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx > > --neo_uri=http://localhost:7474 > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > > 38071 > > > > And once I killed the script at between 3000 & 3500 rows, in tmp > > current there were: > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > 46498 > > > > Latest code with destory_query & updated for 2.0.0 stable at > > http://pastebin.com/v3xUJFeP

> > > Just FYI, with this version info I call destroy_query the files are > open only equal to the amount of rows loaded (this is using the same > code with the destroy_query intact), they are also unlinked from /tmp. > After 59000 rows loading: > > [jsymons@miyu nrls]$ ps -ef | grep extract > jsymons 3356 26122 0 21:08 pts/5 00:00:00 grep extract > jsymons 29042 963 73 13:47 pts/3 05:26:07 /usr/bin/perl -w > /home/jsymons/extract_nrls.M06.pl --input_file RCB_All.xlsx > --neo_uri=http://localhost:7474 > [jsymons@miyu nrls]$ lsof -p 29042 | grep -c '/tmp/.\{10\}' > 59245 > [jsymons@miyu nrls]$ perl -MREST::Neo4p -E 'say $REST::Neo4p::VERSION' > 0.2101 > [jsymons@miyu nrls]$ ls /tmp | grep -c '.\{10\}$' > 59404 > > P.S. sorry it took so long to get back to you I was busy performance > tuning Neo4j and upgrading.

Above also on Neo4j 2.0.0 stable and code is here: http://pastebin.com/mS2F5iJh

Wed Dec 18 16:23:59 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote: Show quoted text

> On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > Calling this resolved; please reopen if necessary. thanks

> > I don't think this is resolved, in fact without destroy_query there's > a whole lot more files open at once, the below result is after loading > only 2500 rows. > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > $REST::Neo4p::VERSION' > 0.126 > [jsymons@larva nrls]$ ps -ef | grep extract > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx > --neo_uri=http://localhost:7474 > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > 38071 > > And once I killed the script at between 3000 & 3500 rows, in tmp > current there were: > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > 46498 > > Latest code with destory_query & updated for 2.0.0 stable at > http://pastebin.com/v3xUJFeP

^without destroy_query

Wed Dec 18 20:42:30 2013 maj.fortinbras [...] gmail.com - Correspondence added

Ok, I will think a bit harder about this. In the latest version (0.2222), there are a couple of things that might help. If you have thousands of files open, then presumably there are thousands of live query objects (without destroy_query). That's probably my bad at some level. Not sure why the files are not cleaned up when $neo4p_query is reassigned to a new query. Old object should go away at that point. However, in the latest version, execute() can take parameter assignments as arguments, and there is now (as you suggested before) a finish() method that will delete the tmp file but leave the object intact. So you could prepare the queries before you start the loop, and use finish at the end of the loop: $incident_query = 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' . ' WHERE i.incident_id = { incident_id }' . ' AND ic.category_level_01 = { category_level_01 }' . ' CREATE UNIQUE' . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; $incident_query_obj = REST::Neo4p::Query->new($incident_query); # suppose an array of 2-elt arrays with desired parm combinations: while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} ) { $incident_query_obj->execute( incident_id => $id, category_level_01 => $lev ); while (my $row = $incident_query_obj->fetch) { ... last if $GOT_WHAT_I_WANT; } $incident_query_obj->finish(); } Each time you call finish, the tmp file is deleted, and each time you call execute a tmp file is created, but all within one query instance. You can set up multiple such query instances outside the loop and use the parameter bindings within the loop. Should make the code more maintainable too. Please try and let me know. I REALLY appreciate this feedback! best MAJ On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote: Show quoted text

> On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > Calling this resolved; please reopen if necessary. thanks

> > > > I don't think this is resolved, in fact without destroy_query there's > > a whole lot more files open at once, the below result is after loading > > only 2500 rows. > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > $REST::Neo4p::VERSION' > > 0.126 > > [jsymons@larva nrls]$ ps -ef | grep extract > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx > > --neo_uri=http://localhost:7474 > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > > 38071 > > > > And once I killed the script at between 3000 & 3500 rows, in tmp > > current there were: > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > 46498 > > > > Latest code with destory_query & updated for 2.0.0 stable at > > http://pastebin.com/v3xUJFeP

> > > ^without destroy_query

Wed Dec 18 20:53:36 2013 maj.fortinbras [...] gmail.com - Correspondence added

oops - a bug in the sample below, should be while ( ($id,$lev) = @{pop @incident_and_category_params} ) On Wed Dec 18 20:42:30 2013, MAJENSEN wrote: Show quoted text

> Ok, I will think a bit harder about this. > > In the latest version (0.2222), there are a couple of things that > might help. If you have thousands of files open, then presumably there > are thousands of live query objects (without destroy_query). That's > probably my bad at some level. Not sure why the files are not cleaned > up when $neo4p_query is reassigned to a new query. Old object should > go away at that point. > > However, in the latest version, execute() can take parameter > assignments as arguments, and there is now (as you suggested before) a > finish() method that will delete the tmp file but leave the object > intact. So you could prepare the queries before you start the loop, > and use finish at the end of the loop: > > $incident_query = > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > . ' WHERE i.incident_id = { incident_id }' > . ' AND ic.category_level_01 = { category_level_01 }' > . ' CREATE UNIQUE' > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > $incident_query_obj = REST::Neo4p::Query->new($incident_query); > # suppose an array of 2-elt arrays with desired parm combinations: > while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} ) { > $incident_query_obj->execute( incident_id => $id, > category_level_01 => $lev ); > while (my $row = $incident_query_obj->fetch) { > ... > last if $GOT_WHAT_I_WANT; > } > > $incident_query_obj->finish(); > } > > Each time you call finish, the tmp file is deleted, and each time you > call execute a tmp file is created, but all within one query instance. > You can set up multiple such query instances outside the loop and use > the parameter bindings within the loop. Should make the code more > maintainable too. > > Please try and let me know. I REALLY appreciate this feedback! > best MAJ > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > Calling this resolved; please reopen if necessary. thanks

> > > > > > I don't think this is resolved, in fact without destroy_query > > > there's > > > a whole lot more files open at once, the below result is after > > > loading > > > only 2500 rows. > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > $REST::Neo4p::VERSION' > > > 0.126 > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > RA3_All.xlsx > > > --neo_uri=http://localhost:7474 > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > > > 38071 > > > > > > And once I killed the script at between 3000 & 3500 rows, in tmp > > > current there were: > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > 46498 > > > > > > Latest code with destory_query & updated for 2.0.0 stable at > > > http://pastebin.com/v3xUJFeP

> > > > > > ^without destroy_query

Thu Dec 19 01:07:29 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

Hi Mark, In my latest code I'm using param queries to take advantage of neo4j query caching. That code is here: http://pastebin.com/mS2F5iJh I'll upgrade REST::Neo4p and integrate $query->finish() into my destroy_query method from the code above (and drop undef $query). On Wed Dec 18 20:53:36 2013, MAJENSEN wrote: Show quoted text

> oops - a bug in the sample below, should be > > while ( ($id,$lev) = @{pop @incident_and_category_params} ) > > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:

> > Ok, I will think a bit harder about this. > > > > In the latest version (0.2222), there are a couple of things that > > might help. If you have thousands of files open, then presumably there > > are thousands of live query objects (without destroy_query). That's > > probably my bad at some level. Not sure why the files are not cleaned > > up when $neo4p_query is reassigned to a new query. Old object should > > go away at that point. > > > > However, in the latest version, execute() can take parameter > > assignments as arguments, and there is now (as you suggested before) a > > finish() method that will delete the tmp file but leave the object > > intact. So you could prepare the queries before you start the loop, > > and use finish at the end of the loop: > > > > $incident_query = > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > > . ' WHERE i.incident_id = { incident_id }' > > . ' AND ic.category_level_01 = { category_level_01 }' > > . ' CREATE UNIQUE' > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > > $incident_query_obj = REST::Neo4p::Query->new($incident_query); > > # suppose an array of 2-elt arrays with desired parm combinations: > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} ) { > > $incident_query_obj->execute( incident_id => $id, > > category_level_01 => $lev ); > > while (my $row = $incident_query_obj->fetch) { > > ... > > last if $GOT_WHAT_I_WANT; > > } > > > > $incident_query_obj->finish(); > > } > > > > Each time you call finish, the tmp file is deleted, and each time you > > call execute a tmp file is created, but all within one query instance. > > You can set up multiple such query instances outside the loop and use > > the parameter bindings within the loop. Should make the code more > > maintainable too. > > > > Please try and let me know. I REALLY appreciate this feedback! > > best MAJ > > > > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > > Calling this resolved; please reopen if necessary. thanks

> > > > > > > > I don't think this is resolved, in fact without destroy_query > > > > there's > > > > a whole lot more files open at once, the below result is after > > > > loading > > > > only 2500 rows. > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > $REST::Neo4p::VERSION' > > > > 0.126 > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl -w > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > RA3_All.xlsx > > > > --neo_uri=http://localhost:7474 > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > > > > 38071 > > > > > > > > And once I killed the script at between 3000 & 3500 rows, in tmp > > > > current there were: > > > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > > 46498 > > > > > > > > Latest code with destory_query & updated for 2.0.0 stable at > > > > http://pastebin.com/v3xUJFeP

> > > > > > > > > ^without destroy_query

> >

Thu Dec 19 01:34:09 2013 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

On Thu Dec 19 01:07:29 2013, symonsjo@gmail.com wrote: Show quoted text

> Hi Mark, > > In my latest code I'm using param queries to take advantage of neo4j > query caching. That code is here: http://pastebin.com/mS2F5iJh > > I'll upgrade REST::Neo4p and integrate $query->finish() into my > destroy_query method from the code above (and drop undef $query). > > On Wed Dec 18 20:53:36 2013, MAJENSEN wrote:

> > oops - a bug in the sample below, should be > > > > while ( ($id,$lev) = @{pop @incident_and_category_params} ) > > > > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:

> > > Ok, I will think a bit harder about this. > > > > > > In the latest version (0.2222), there are a couple of things that > > > might help. If you have thousands of files open, then presumably > > > there > > > are thousands of live query objects (without destroy_query). That's > > > probably my bad at some level. Not sure why the files are not > > > cleaned > > > up when $neo4p_query is reassigned to a new query. Old object > > > should > > > go away at that point. > > > > > > However, in the latest version, execute() can take parameter > > > assignments as arguments, and there is now (as you suggested > > > before) a > > > finish() method that will delete the tmp file but leave the object > > > intact. So you could prepare the queries before you start the loop, > > > and use finish at the end of the loop: > > > > > > $incident_query = > > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > > > . ' WHERE i.incident_id = { incident_id }' > > > . ' AND ic.category_level_01 = { category_level_01 }' > > > . ' CREATE UNIQUE' > > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > > > $incident_query_obj = REST::Neo4p::Query->new($incident_query); > > > # suppose an array of 2-elt arrays with desired parm combinations: > > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} ) { > > > $incident_query_obj->execute( incident_id => $id, > > > category_level_01 => $lev ); > > > while (my $row = $incident_query_obj->fetch) { > > > ... > > > last if $GOT_WHAT_I_WANT; > > > } > > > > > > $incident_query_obj->finish(); > > > } > > > > > > Each time you call finish, the tmp file is deleted, and each time > > > you > > > call execute a tmp file is created, but all within one query > > > instance. > > > You can set up multiple such query instances outside the loop and > > > use > > > the parameter bindings within the loop. Should make the code more > > > maintainable too. > > > > > > Please try and let me know. I REALLY appreciate this feedback! > > > best MAJ > > > > > > > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > > > Calling this resolved; please reopen if necessary. thanks

> > > > > > > > > > I don't think this is resolved, in fact without destroy_query > > > > > there's > > > > > a whole lot more files open at once, the below result is after > > > > > loading > > > > > only 2500 rows. > > > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > > $REST::Neo4p::VERSION' > > > > > 0.126 > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl > > > > > -w > > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > > RA3_All.xlsx > > > > > --neo_uri=http://localhost:7474 > > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > > > > > 38071 > > > > > > > > > > And once I killed the script at between 3000 & 3500 rows, in > > > > > tmp > > > > > current there were: > > > > > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > > > 46498 > > > > > > > > > > Latest code with destory_query & updated for 2.0.0 stable at > > > > > http://pastebin.com/v3xUJFeP

> > > > > > > > > > > > ^without destroy_query

> > > >

Hello Mark, I upgraded to 0.2222 as suggested: [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say $REST::Neo4p::VERSION' 0.2222 And I added $query->finish() to the destroy_query sub and removed the undef $query in the code in pastebin here: http://pastebin.com/mS2F5iJh But after loading 500 rows from the spreadsheet without the undef in place I end up with 7045 open files: [jsymons@larva nrls]$ /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 2> RA3_All.xlsx.loading.error.log | tee RA3_All.xlsx.loading.log Processed 500 rows. Thu Dec 19 06:18:46 2013 [jsymons@larva nrls]$ ps -ef | grep extract jsymons 5268 19391 79 06:16 pts/11 00:00:05 /usr/bin/perl -w /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 jsymons 5271 20114 0 06:16 pts/13 00:00:00 grep extract [jsymons@larva nrls]$ while true ; do echo "/tmp has " $(lsof -p 5268 | grep -c '/tmp/.\{10\}') " open files for pid" ; sleep 10 ; done /tmp has 1086 open files for pid /tmp has 1612 open files for pid /tmp has 2122 open files for pid /tmp has 2634 open files for pid /tmp has 3201 open files for pid /tmp has 3783 open files for pid /tmp has 4317 open files for pid /tmp has 4838 open files for pid /tmp has 5387 open files for pid /tmp has 5933 open files for pid /tmp has 6479 open files for pid /tmp has 7045 open files for pid Actually even with undef on the newer code it's still worse than it was with 0.126 as far as open files go because I also tried with $query-finish(); AND undef $query; and still had a huge amount of open files. Sub is below: sub destroy_query { my $query = shift; if (defined($query)) { while (my $response = $query->fetch()) { # Who cares, throw it out } $query->finish(); # Cleanup suggested by Mark undef $query; } }

Thu Dec 26 20:22:11 2013 maj.fortinbras [...] gmail.com - Correspondence added

Hi Jo- I think you will get more mileage if you set the parameter values in the execute() method (I don't see that in the paste); then you don't have to create a separate query object for each set of parameter values: Rather than something like: $query = 'START n=...{my_parmA}...RETURN n'; $q1 = REST::Neo4p::Query->new($query, {my_parmA => 1}); $q1->execute(); $q2 = REST::Neo4p::Query->new($query, {my_parmA => 2}); $q2->execute(); ... try (with v0.2222) $query = 'START n=...{my_parmA}...RETURN n'; $q = REST::neo4p::Query->new($query); for (1..2) { $q->execute({my_parmA => $_}); } Here you're only creating one query object, but executing it multiple times with new parameter values. I think this will cut down the number of open files. Let me know- MAJ On Thu Dec 19 01:34:09 2013, symonsjo@gmail.com wrote: Show quoted text

> On Thu Dec 19 01:07:29 2013, symonsjo@gmail.com wrote:

> > Hi Mark, > > > > In my latest code I'm using param queries to take advantage of neo4j > > query caching. That code is here: http://pastebin.com/mS2F5iJh > > > > I'll upgrade REST::Neo4p and integrate $query->finish() into my > > destroy_query method from the code above (and drop undef $query). > > > > On Wed Dec 18 20:53:36 2013, MAJENSEN wrote:

> > > oops - a bug in the sample below, should be > > > > > > while ( ($id,$lev) = @{pop @incident_and_category_params} ) > > > > > > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:

> > > > Ok, I will think a bit harder about this. > > > > > > > > In the latest version (0.2222), there are a couple of things that > > > > might help. If you have thousands of files open, then presumably > > > > there > > > > are thousands of live query objects (without destroy_query). > > > > That's > > > > probably my bad at some level. Not sure why the files are not > > > > cleaned > > > > up when $neo4p_query is reassigned to a new query. Old object > > > > should > > > > go away at that point. > > > > > > > > However, in the latest version, execute() can take parameter > > > > assignments as arguments, and there is now (as you suggested > > > > before) a > > > > finish() method that will delete the tmp file but leave the > > > > object > > > > intact. So you could prepare the queries before you start the > > > > loop, > > > > and use finish at the end of the loop: > > > > > > > > $incident_query = > > > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > > > > . ' WHERE i.incident_id = { incident_id }' > > > > . ' AND ic.category_level_01 = { category_level_01 }' > > > > . ' CREATE UNIQUE' > > > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > > > > $incident_query_obj = REST::Neo4p::Query->new($incident_query); > > > > # suppose an array of 2-elt arrays with desired parm > > > > combinations: > > > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} ) > > > > { > > > > $incident_query_obj->execute( incident_id => $id, > > > > category_level_01 => $lev ); > > > > while (my $row = $incident_query_obj->fetch) { > > > > ... > > > > last if $GOT_WHAT_I_WANT; > > > > } > > > > > > > > $incident_query_obj->finish(); > > > > } > > > > > > > > Each time you call finish, the tmp file is deleted, and each time > > > > you > > > > call execute a tmp file is created, but all within one query > > > > instance. > > > > You can set up multiple such query instances outside the loop and > > > > use > > > > the parameter bindings within the loop. Should make the code more > > > > maintainable too. > > > > > > > > Please try and let me know. I REALLY appreciate this feedback! > > > > best MAJ > > > > > > > > > > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > > > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > > > > Calling this resolved; please reopen if necessary. thanks

> > > > > > > > > > > > I don't think this is resolved, in fact without destroy_query > > > > > > there's > > > > > > a whole lot more files open at once, the below result is > > > > > > after > > > > > > loading > > > > > > only 2500 rows. > > > > > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > > > $REST::Neo4p::VERSION' > > > > > > 0.126 > > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 /usr/bin/perl > > > > > > -w > > > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > > > RA3_All.xlsx > > > > > > --neo_uri=http://localhost:7474 > > > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep extract > > > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c '/tmp/.\{10\}' > > > > > > 38071 > > > > > > > > > > > > And once I killed the script at between 3000 & 3500 rows, in > > > > > > tmp > > > > > > current there were: > > > > > > > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > > > > 46498 > > > > > > > > > > > > Latest code with destory_query & updated for 2.0.0 stable at > > > > > > http://pastebin.com/v3xUJFeP

> > > > > > > > > > > > > > > ^without destroy_query

> > > > > >

> > > Hello Mark, > > I upgraded to 0.2222 as suggested: > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > $REST::Neo4p::VERSION' > 0.2222 > > And I added $query->finish() to the destroy_query sub and removed the > undef $query in the code in pastebin here: > http://pastebin.com/mS2F5iJh > > But after loading 500 rows from the spreadsheet without the undef in > place I end up with 7045 open files: > > [jsymons@larva nrls]$ /home/jsymons/extract_nrls.2.0.0_stable.pl > --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 2> > RA3_All.xlsx.loading.error.log | tee RA3_All.xlsx.loading.log > Processed 500 rows. Thu Dec 19 06:18:46 2013 > > [jsymons@larva nrls]$ ps -ef | grep extract > jsymons 5268 19391 79 06:16 > pts/11 00:00:05 /usr/bin/perl -w > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx > --neo_uri=http://localhost:7474 > jsymons 5271 20114 0 06:16 pts/13 00:00:00 grep extract > [jsymons@larva nrls]$ while true ; do echo "/tmp has " $(lsof -p 5268 > | grep -c '/tmp/.\{10\}') " open files for pid" ; sleep 10 ; done > /tmp has 1086 open files for pid > /tmp has 1612 open files for pid > /tmp has 2122 open files for pid > /tmp has 2634 open files for pid > /tmp has 3201 open files for pid > /tmp has 3783 open files for pid > /tmp has 4317 open files for pid > /tmp has 4838 open files for pid > /tmp has 5387 open files for pid > /tmp has 5933 open files for pid > /tmp has 6479 open files for pid > /tmp has 7045 open files for pid > > Actually even with undef on the newer code it's still worse than it > was with 0.126 as far as open files go because I also tried with > $query-finish(); AND undef $query; and still had a huge amount of open > files. Sub is below: > > sub destroy_query { > my $query = shift; > > if (defined($query)) { > while (my $response = $query->fetch()) { > # Who cares, throw it out > } > $query->finish(); # Cleanup suggested by Mark > undef $query; > } > }

Wed Jan 01 05:39:50 2014 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

Hi MAJ, So I've updated the code and I've confirmed that I now only have open files for the queries I've opened. I apologize that the last code I pasted you wasn't correct but I've updated the code even further and it's working beautifully. Thanks. The updated code is here: http://pastebin.com/GNaWKB48 ~ icenine On Thu Dec 26 20:22:11 2013, MAJENSEN wrote: Show quoted text

> Hi Jo- I think you will get more mileage if you set the parameter > values in the execute() method (I don't see that in the paste); then > you don't have to create a separate query object for each set of > parameter values: > Rather than something like: > > $query = 'START n=...{my_parmA}...RETURN n'; > $q1 = REST::Neo4p::Query->new($query, {my_parmA => 1}); > $q1->execute(); > $q2 = REST::Neo4p::Query->new($query, {my_parmA => 2}); > $q2->execute(); > ... > > try (with v0.2222) > > $query = 'START n=...{my_parmA}...RETURN n'; > $q = REST::neo4p::Query->new($query); > for (1..2) { > $q->execute({my_parmA => $_}); > } > > Here you're only creating one query object, but executing it multiple > times with new parameter values. I think this will cut down the number > of open files. > > Let me know- MAJ > > > On Thu Dec 19 01:34:09 2013, symonsjo@gmail.com wrote:

> > On Thu Dec 19 01:07:29 2013, symonsjo@gmail.com wrote:

> > > Hi Mark, > > > > > > In my latest code I'm using param queries to take advantage of > > > neo4j > > > query caching. That code is here: http://pastebin.com/mS2F5iJh > > > > > > I'll upgrade REST::Neo4p and integrate $query->finish() into my > > > destroy_query method from the code above (and drop undef $query). > > > > > > On Wed Dec 18 20:53:36 2013, MAJENSEN wrote:

> > > > oops - a bug in the sample below, should be > > > > > > > > while ( ($id,$lev) = @{pop @incident_and_category_params} ) > > > > > > > > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:

> > > > > Ok, I will think a bit harder about this. > > > > > > > > > > In the latest version (0.2222), there are a couple of things > > > > > that > > > > > might help. If you have thousands of files open, then > > > > > presumably > > > > > there > > > > > are thousands of live query objects (without destroy_query). > > > > > That's > > > > > probably my bad at some level. Not sure why the files are not > > > > > cleaned > > > > > up when $neo4p_query is reassigned to a new query. Old object > > > > > should > > > > > go away at that point. > > > > > > > > > > However, in the latest version, execute() can take parameter > > > > > assignments as arguments, and there is now (as you suggested > > > > > before) a > > > > > finish() method that will delete the tmp file but leave the > > > > > object > > > > > intact. So you could prepare the queries before you start the > > > > > loop, > > > > > and use finish at the end of the loop: > > > > > > > > > > $incident_query = > > > > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > > > > > . ' WHERE i.incident_id = { incident_id }' > > > > > . ' AND ic.category_level_01 = { category_level_01 > > > > > }' > > > > > . ' CREATE UNIQUE' > > > > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > > > > > $incident_query_obj = REST::Neo4p::Query->new($incident_query); > > > > > # suppose an array of 2-elt arrays with desired parm > > > > > combinations: > > > > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params }} > > > > > ) > > > > > { > > > > > $incident_query_obj->execute( incident_id => $id, > > > > > category_level_01 => $lev ); > > > > > while (my $row = $incident_query_obj->fetch) { > > > > > ... > > > > > last if $GOT_WHAT_I_WANT; > > > > > } > > > > > > > > > > $incident_query_obj->finish(); > > > > > } > > > > > > > > > > Each time you call finish, the tmp file is deleted, and each > > > > > time > > > > > you > > > > > call execute a tmp file is created, but all within one query > > > > > instance. > > > > > You can set up multiple such query instances outside the loop > > > > > and > > > > > use > > > > > the parameter bindings within the loop. Should make the code > > > > > more > > > > > maintainable too. > > > > > > > > > > Please try and let me know. I REALLY appreciate this feedback! > > > > > best MAJ > > > > > > > > > > > > > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > > > > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > > > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > > > > > Calling this resolved; please reopen if necessary. thanks

> > > > > > > > > > > > > > I don't think this is resolved, in fact without > > > > > > > destroy_query > > > > > > > there's > > > > > > > a whole lot more files open at once, the below result is > > > > > > > after > > > > > > > loading > > > > > > > only 2500 rows. > > > > > > > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > > > > $REST::Neo4p::VERSION' > > > > > > > 0.126 > > > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 > > > > > > > /usr/bin/perl > > > > > > > -w > > > > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > > > > RA3_All.xlsx > > > > > > > --neo_uri=http://localhost:7474 > > > > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep > > > > > > > extract > > > > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c > > > > > > > '/tmp/.\{10\}' > > > > > > > 38071 > > > > > > > > > > > > > > And once I killed the script at between 3000 & 3500 rows, > > > > > > > in > > > > > > > tmp > > > > > > > current there were: > > > > > > > > > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > > > > > 46498 > > > > > > > > > > > > > > Latest code with destory_query & updated for 2.0.0 stable > > > > > > > at > > > > > > > http://pastebin.com/v3xUJFeP

> > > > > > > > > > > > > > > > > > ^without destroy_query

> > > > > > > >

> > > > > > Hello Mark, > > > > I upgraded to 0.2222 as suggested: > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > $REST::Neo4p::VERSION' > > 0.2222 > > > > And I added $query->finish() to the destroy_query sub and removed the > > undef $query in the code in pastebin here: > > http://pastebin.com/mS2F5iJh > > > > But after loading 500 rows from the spreadsheet without the undef in > > place I end up with 7045 open files: > > > > [jsymons@larva nrls]$ /home/jsymons/extract_nrls.2.0.0_stable.pl > > --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 2> > > RA3_All.xlsx.loading.error.log | tee RA3_All.xlsx.loading.log > > Processed 500 rows. Thu Dec 19 06:18:46 2013 > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > jsymons 5268 19391 79 06:16 > > pts/11 00:00:05 /usr/bin/perl -w > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file RA3_All.xlsx > > --neo_uri=http://localhost:7474 > > jsymons 5271 20114 0 06:16 pts/13 00:00:00 grep extract > > [jsymons@larva nrls]$ while true ; do echo "/tmp has " $(lsof -p 5268 > > | grep -c '/tmp/.\{10\}') " open files for pid" ; sleep 10 ; done > > /tmp has 1086 open files for pid > > /tmp has 1612 open files for pid > > /tmp has 2122 open files for pid > > /tmp has 2634 open files for pid > > /tmp has 3201 open files for pid > > /tmp has 3783 open files for pid > > /tmp has 4317 open files for pid > > /tmp has 4838 open files for pid > > /tmp has 5387 open files for pid > > /tmp has 5933 open files for pid > > /tmp has 6479 open files for pid > > /tmp has 7045 open files for pid > > > > Actually even with undef on the newer code it's still worse than it > > was with 0.126 as far as open files go because I also tried with > > $query-finish(); AND undef $query; and still had a huge amount of > > open > > files. Sub is below: > > > > sub destroy_query { > > my $query = shift; > > > > if (defined($query)) { > > while (my $response = $query->fetch()) { > > # Who cares, throw it out > > } > > $query->finish(); # Cleanup suggested by Mark > > undef $query; > > } > > }

Wed Jan 01 12:16:54 2014 maj.fortinbras [...] gmail.com - Correspondence added

Ice-9, that's awesome. Looks like a cool application too. Happy New Year! MAJ On Wed Jan 01 05:39:50 2014, symonsjo@gmail.com wrote: Show quoted text

> Hi MAJ, > > So I've updated the code and I've confirmed that I now only have open > files for the queries I've opened. I apologize that the last code I > pasted you wasn't correct but I've updated the code even further and > it's working beautifully. Thanks. The updated code is here: > > http://pastebin.com/GNaWKB48 > > ~ icenine > > On Thu Dec 26 20:22:11 2013, MAJENSEN wrote:

> > Hi Jo- I think you will get more mileage if you set the parameter > > values in the execute() method (I don't see that in the paste); then > > you don't have to create a separate query object for each set of > > parameter values: > > Rather than something like: > > > > $query = 'START n=...{my_parmA}...RETURN n'; > > $q1 = REST::Neo4p::Query->new($query, {my_parmA => 1}); > > $q1->execute(); > > $q2 = REST::Neo4p::Query->new($query, {my_parmA => 2}); > > $q2->execute(); > > ... > > > > try (with v0.2222) > > > > $query = 'START n=...{my_parmA}...RETURN n'; > > $q = REST::neo4p::Query->new($query); > > for (1..2) { > > $q->execute({my_parmA => $_}); > > } > > > > Here you're only creating one query object, but executing it multiple > > times with new parameter values. I think this will cut down the > > number > > of open files. > > > > Let me know- MAJ > > > > > > On Thu Dec 19 01:34:09 2013, symonsjo@gmail.com wrote:

> > > On Thu Dec 19 01:07:29 2013, symonsjo@gmail.com wrote:

> > > > Hi Mark, > > > > > > > > In my latest code I'm using param queries to take advantage of > > > > neo4j > > > > query caching. That code is here: http://pastebin.com/mS2F5iJh > > > > > > > > I'll upgrade REST::Neo4p and integrate $query->finish() into my > > > > destroy_query method from the code above (and drop undef $query). > > > > > > > > On Wed Dec 18 20:53:36 2013, MAJENSEN wrote:

> > > > > oops - a bug in the sample below, should be > > > > > > > > > > while ( ($id,$lev) = @{pop @incident_and_category_params} ) > > > > > > > > > > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:

> > > > > > Ok, I will think a bit harder about this. > > > > > > > > > > > > In the latest version (0.2222), there are a couple of things > > > > > > that > > > > > > might help. If you have thousands of files open, then > > > > > > presumably > > > > > > there > > > > > > are thousands of live query objects (without destroy_query). > > > > > > That's > > > > > > probably my bad at some level. Not sure why the files are not > > > > > > cleaned > > > > > > up when $neo4p_query is reassigned to a new query. Old object > > > > > > should > > > > > > go away at that point. > > > > > > > > > > > > However, in the latest version, execute() can take parameter > > > > > > assignments as arguments, and there is now (as you suggested > > > > > > before) a > > > > > > finish() method that will delete the tmp file but leave the > > > > > > object > > > > > > intact. So you could prepare the queries before you start the > > > > > > loop, > > > > > > and use finish at the end of the loop: > > > > > > > > > > > > $incident_query = > > > > > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > > > > > > . ' WHERE i.incident_id = { incident_id }' > > > > > > . ' AND ic.category_level_01 = { > > > > > > category_level_01 > > > > > > }' > > > > > > . ' CREATE UNIQUE' > > > > > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > > > > > > $incident_query_obj = REST::Neo4p::Query-

> > > > > > >new($incident_query);

> > > > > > # suppose an array of 2-elt arrays with desired parm > > > > > > combinations: > > > > > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params > > > > > > }} > > > > > > ) > > > > > > { > > > > > > $incident_query_obj->execute( incident_id => $id, > > > > > > category_level_01 => $lev ); > > > > > > while (my $row = $incident_query_obj->fetch) { > > > > > > ... > > > > > > last if $GOT_WHAT_I_WANT; > > > > > > } > > > > > > > > > > > > $incident_query_obj->finish(); > > > > > > } > > > > > > > > > > > > Each time you call finish, the tmp file is deleted, and each > > > > > > time > > > > > > you > > > > > > call execute a tmp file is created, but all within one query > > > > > > instance. > > > > > > You can set up multiple such query instances outside the loop > > > > > > and > > > > > > use > > > > > > the parameter bindings within the loop. Should make the code > > > > > > more > > > > > > maintainable too. > > > > > > > > > > > > Please try and let me know. I REALLY appreciate this > > > > > > feedback! > > > > > > best MAJ > > > > > > > > > > > > > > > > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > > > > > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > > > > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > > > > > > Calling this resolved; please reopen if necessary. > > > > > > > > > thanks

> > > > > > > > > > > > > > > > I don't think this is resolved, in fact without > > > > > > > > destroy_query > > > > > > > > there's > > > > > > > > a whole lot more files open at once, the below result is > > > > > > > > after > > > > > > > > loading > > > > > > > > only 2500 rows. > > > > > > > > > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > > > > > $REST::Neo4p::VERSION' > > > > > > > > 0.126 > > > > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 > > > > > > > > /usr/bin/perl > > > > > > > > -w > > > > > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > > > > > RA3_All.xlsx > > > > > > > > --neo_uri=http://localhost:7474 > > > > > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep > > > > > > > > extract > > > > > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c > > > > > > > > '/tmp/.\{10\}' > > > > > > > > 38071 > > > > > > > > > > > > > > > > And once I killed the script at between 3000 & 3500 rows, > > > > > > > > in > > > > > > > > tmp > > > > > > > > current there were: > > > > > > > > > > > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > > > > > > 46498 > > > > > > > > > > > > > > > > Latest code with destory_query & updated for 2.0.0 stable > > > > > > > > at > > > > > > > > http://pastebin.com/v3xUJFeP

> > > > > > > > > > > > > > > > > > > > > ^without destroy_query

> > > > > > > > > >

> > > > > > > > > Hello Mark, > > > > > > I upgraded to 0.2222 as suggested: > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > $REST::Neo4p::VERSION' > > > 0.2222 > > > > > > And I added $query->finish() to the destroy_query sub and removed > > > the > > > undef $query in the code in pastebin here: > > > http://pastebin.com/mS2F5iJh > > > > > > But after loading 500 rows from the spreadsheet without the undef > > > in > > > place I end up with 7045 open files: > > > > > > [jsymons@larva nrls]$ /home/jsymons/extract_nrls.2.0.0_stable.pl > > > --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 2> > > > RA3_All.xlsx.loading.error.log | tee RA3_All.xlsx.loading.log > > > Processed 500 rows. Thu Dec 19 06:18:46 2013 > > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > jsymons 5268 19391 79 06:16 > > > pts/11 00:00:05 /usr/bin/perl -w > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > RA3_All.xlsx > > > --neo_uri=http://localhost:7474 > > > jsymons 5271 20114 0 06:16 pts/13 00:00:00 grep extract > > > [jsymons@larva nrls]$ while true ; do echo "/tmp has " $(lsof -p > > > 5268 > > > | grep -c '/tmp/.\{10\}') " open files for pid" ; sleep 10 ; done > > > /tmp has 1086 open files for pid > > > /tmp has 1612 open files for pid > > > /tmp has 2122 open files for pid > > > /tmp has 2634 open files for pid > > > /tmp has 3201 open files for pid > > > /tmp has 3783 open files for pid > > > /tmp has 4317 open files for pid > > > /tmp has 4838 open files for pid > > > /tmp has 5387 open files for pid > > > /tmp has 5933 open files for pid > > > /tmp has 6479 open files for pid > > > /tmp has 7045 open files for pid > > > > > > Actually even with undef on the newer code it's still worse than it > > > was with 0.126 as far as open files go because I also tried with > > > $query-finish(); AND undef $query; and still had a huge amount of > > > open > > > files. Sub is below: > > > > > > sub destroy_query { > > > my $query = shift; > > > > > > if (defined($query)) { > > > while (my $response = $query->fetch()) { > > > # Who cares, throw it out > > > } > > > $query->finish(); # Cleanup suggested by Mark > > > undef $query; > > > } > > > }

Wed Jan 01 12:17:17 2014 maj.fortinbras [...] gmail.com - Status changed from 'open' to 'resolved'

Wed Jan 01 12:17:17 2014 maj.fortinbras [...] gmail.com - Fixed in 0.2220 added

Wed Jan 01 12:17:17 2014 maj.fortinbras [...] gmail.com - Fixed in 0.2111 deleted

Wed Jan 01 12:44:38 2014 symonsjo [...] gmail.com - Correspondence added

From:

symonsjo [...] gmail.com

Thanks MAJ, happy new year as well. ~ icenine On Wed Jan 01 12:16:54 2014, MAJENSEN wrote: Show quoted text

> Ice-9, that's awesome. Looks like a cool application too. > Happy New Year! MAJ > On Wed Jan 01 05:39:50 2014, symonsjo@gmail.com wrote:

> > Hi MAJ, > > > > So I've updated the code and I've confirmed that I now only have open > > files for the queries I've opened. I apologize that the last code I > > pasted you wasn't correct but I've updated the code even further and > > it's working beautifully. Thanks. The updated code is here: > > > > http://pastebin.com/GNaWKB48 > > > > ~ icenine > > > > On Thu Dec 26 20:22:11 2013, MAJENSEN wrote:

> > > Hi Jo- I think you will get more mileage if you set the parameter > > > values in the execute() method (I don't see that in the paste); then > > > you don't have to create a separate query object for each set of > > > parameter values: > > > Rather than something like: > > > > > > $query = 'START n=...{my_parmA}...RETURN n'; > > > $q1 = REST::Neo4p::Query->new($query, {my_parmA => 1}); > > > $q1->execute(); > > > $q2 = REST::Neo4p::Query->new($query, {my_parmA => 2}); > > > $q2->execute(); > > > ... > > > > > > try (with v0.2222) > > > > > > $query = 'START n=...{my_parmA}...RETURN n'; > > > $q = REST::neo4p::Query->new($query); > > > for (1..2) { > > > $q->execute({my_parmA => $_}); > > > } > > > > > > Here you're only creating one query object, but executing it multiple > > > times with new parameter values. I think this will cut down the > > > number > > > of open files. > > > > > > Let me know- MAJ > > > > > > > > > On Thu Dec 19 01:34:09 2013, symonsjo@gmail.com wrote:

> > > > On Thu Dec 19 01:07:29 2013, symonsjo@gmail.com wrote:

> > > > > Hi Mark, > > > > > > > > > > In my latest code I'm using param queries to take advantage of > > > > > neo4j > > > > > query caching. That code is here: http://pastebin.com/mS2F5iJh > > > > > > > > > > I'll upgrade REST::Neo4p and integrate $query->finish() into my > > > > > destroy_query method from the code above (and drop undef $query). > > > > > > > > > > On Wed Dec 18 20:53:36 2013, MAJENSEN wrote:

> > > > > > oops - a bug in the sample below, should be > > > > > > > > > > > > while ( ($id,$lev) = @{pop @incident_and_category_params} ) > > > > > > > > > > > > On Wed Dec 18 20:42:30 2013, MAJENSEN wrote:

> > > > > > > Ok, I will think a bit harder about this. > > > > > > > > > > > > > > In the latest version (0.2222), there are a couple of things > > > > > > > that > > > > > > > might help. If you have thousands of files open, then > > > > > > > presumably > > > > > > > there > > > > > > > are thousands of live query objects (without destroy_query). > > > > > > > That's > > > > > > > probably my bad at some level. Not sure why the files are not > > > > > > > cleaned > > > > > > > up when $neo4p_query is reassigned to a new query. Old object > > > > > > > should > > > > > > > go away at that point. > > > > > > > > > > > > > > However, in the latest version, execute() can take parameter > > > > > > > assignments as arguments, and there is now (as you suggested > > > > > > > before) a > > > > > > > finish() method that will delete the tmp file but leave the > > > > > > > object > > > > > > > intact. So you could prepare the queries before you start the > > > > > > > loop, > > > > > > > and use finish at the end of the loop: > > > > > > > > > > > > > > $incident_query = > > > > > > > 'MATCH (i:INCIDENT), (ic:INCIDENT_CATEGORY)' > > > > > > > . ' WHERE i.incident_id = { incident_id }' > > > > > > > . ' AND ic.category_level_01 = { > > > > > > > category_level_01 > > > > > > > }' > > > > > > > . ' CREATE UNIQUE' > > > > > > > . ' ic<-[r:HAS_INCIDENT_CATEGORY]-i RETURN r'; > > > > > > > $incident_query_obj = REST::Neo4p::Query-

> > > > > > > >new($incident_query);

> > > > > > > # suppose an array of 2-elt arrays with desired parm > > > > > > > combinations: > > > > > > > while ( ($id,$lev) = @{pop @{ @incident_and_category_params > > > > > > > }} > > > > > > > ) > > > > > > > { > > > > > > > $incident_query_obj->execute( incident_id => $id, > > > > > > > category_level_01 => $lev ); > > > > > > > while (my $row = $incident_query_obj->fetch) { > > > > > > > ... > > > > > > > last if $GOT_WHAT_I_WANT; > > > > > > > } > > > > > > > > > > > > > > $incident_query_obj->finish(); > > > > > > > } > > > > > > > > > > > > > > Each time you call finish, the tmp file is deleted, and each > > > > > > > time > > > > > > > you > > > > > > > call execute a tmp file is created, but all within one query > > > > > > > instance. > > > > > > > You can set up multiple such query instances outside the loop > > > > > > > and > > > > > > > use > > > > > > > the parameter bindings within the loop. Should make the code > > > > > > > more > > > > > > > maintainable too. > > > > > > > > > > > > > > Please try and let me know. I REALLY appreciate this > > > > > > > feedback! > > > > > > > best MAJ > > > > > > > > > > > > > > > > > > > > > On Wed Dec 18 16:23:59 2013, symonsjo@gmail.com wrote:

> > > > > > > > On Wed Dec 18 16:06:10 2013, symonsjo@gmail.com wrote:

> > > > > > > > > On Tue Nov 05 23:42:29 2013, MAJENSEN wrote:

> > > > > > > > > > Calling this resolved; please reopen if necessary. > > > > > > > > > > thanks

> > > > > > > > > > > > > > > > > > I don't think this is resolved, in fact without > > > > > > > > > destroy_query > > > > > > > > > there's > > > > > > > > > a whole lot more files open at once, the below result is > > > > > > > > > after > > > > > > > > > loading > > > > > > > > > only 2500 rows. > > > > > > > > > > > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > > > > > > $REST::Neo4p::VERSION' > > > > > > > > > 0.126 > > > > > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > > > > > > jsymons 20538 19391 64 20:46 pts/11 00:06:49 > > > > > > > > > /usr/bin/perl > > > > > > > > > -w > > > > > > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > > > > > > RA3_All.xlsx > > > > > > > > > --neo_uri=http://localhost:7474 > > > > > > > > > jsymons 21582 20114 0 20:56 pts/13 00:00:00 grep > > > > > > > > > extract > > > > > > > > > [jsymons@larva nrls]$ lsof -p 20538 | grep -c > > > > > > > > > '/tmp/.\{10\}' > > > > > > > > > 38071 > > > > > > > > > > > > > > > > > > And once I killed the script at between 3000 & 3500 rows, > > > > > > > > > in > > > > > > > > > tmp > > > > > > > > > current there were: > > > > > > > > > > > > > > > > > > [jsymons@larva nrls]$ ls -l /tmp | grep -c '.\{10\}$' > > > > > > > > > 46498 > > > > > > > > > > > > > > > > > > Latest code with destory_query & updated for 2.0.0 stable > > > > > > > > > at > > > > > > > > > http://pastebin.com/v3xUJFeP

> > > > > > > > > > > > > > > > > > > > > > > > ^without destroy_query

> > > > > > > > > > > >

> > > > > > > > > > > > Hello Mark, > > > > > > > > I upgraded to 0.2222 as suggested: > > > > > > > > [jsymons@larva nrls]$ perl -MREST::Neo4p -E 'say > > > > $REST::Neo4p::VERSION' > > > > 0.2222 > > > > > > > > And I added $query->finish() to the destroy_query sub and removed > > > > the > > > > undef $query in the code in pastebin here: > > > > http://pastebin.com/mS2F5iJh > > > > > > > > But after loading 500 rows from the spreadsheet without the undef > > > > in > > > > place I end up with 7045 open files: > > > > > > > > [jsymons@larva nrls]$ /home/jsymons/extract_nrls.2.0.0_stable.pl > > > > --input_file RA3_All.xlsx --neo_uri=http://localhost:7474 2> > > > > RA3_All.xlsx.loading.error.log | tee RA3_All.xlsx.loading.log > > > > Processed 500 rows. Thu Dec 19 06:18:46 2013 > > > > > > > > [jsymons@larva nrls]$ ps -ef | grep extract > > > > jsymons 5268 19391 79 06:16 > > > > pts/11 00:00:05 /usr/bin/perl -w > > > > /home/jsymons/extract_nrls.2.0.0_stable.pl --input_file > > > > RA3_All.xlsx > > > > --neo_uri=http://localhost:7474 > > > > jsymons 5271 20114 0 06:16 pts/13 00:00:00 grep extract > > > > [jsymons@larva nrls]$ while true ; do echo "/tmp has " $(lsof -p > > > > 5268 > > > > | grep -c '/tmp/.\{10\}') " open files for pid" ; sleep 10 ; done > > > > /tmp has 1086 open files for pid > > > > /tmp has 1612 open files for pid > > > > /tmp has 2122 open files for pid > > > > /tmp has 2634 open files for pid > > > > /tmp has 3201 open files for pid > > > > /tmp has 3783 open files for pid > > > > /tmp has 4317 open files for pid > > > > /tmp has 4838 open files for pid > > > > /tmp has 5387 open files for pid > > > > /tmp has 5933 open files for pid > > > > /tmp has 6479 open files for pid > > > > /tmp has 7045 open files for pid > > > > > > > > Actually even with undef on the newer code it's still worse than it > > > > was with 0.126 as far as open files go because I also tried with > > > > $query-finish(); AND undef $query; and still had a huge amount of > > > > open > > > > files. Sub is below: > > > > > > > > sub destroy_query { > > > > my $query = shift; > > > > > > > > if (defined($query)) { > > > > while (my $response = $query->fetch()) { > > > > # Who cares, throw it out > > > > } > > > > $query->finish(); # Cleanup suggested by Mark > > > > undef $query; > > > > } > > > > }

> >