Skip Menu |

This queue is for tickets about the Plucene CPAN distribution.

Report information
The Basics
Id: 5815
Status: new
Priority: 0/
Queue: Plucene

People
Owner: Nobody in particular
Requestors: adrianh [...] quietstars.com
andy [...] petdance.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Test failure in stress.t
Dist: Plucene 1.05 Perl: 5.8.3 Platform: Mac OS X 1.2.8 (Darwin davis.local. 6.8 Darwin Kernel Version 6.8: Wed Sep 10 15:20:55 PDT 2003; root:xnu/xnu-344.49.obj~2/RELEASE_PPC Power Macintosh powerpc) stress.t gives me: [ lots of oks] ok 48 - Right documents Plucene::Store::InputStream, r: t/homer_index/1209/_21.frq Too many open files at blib/lib/Plucene/Store/InputStream.pm line 41. 1..48 # Looks like your test died just after 48. Sorry for lack of detail. Not got the time to poke at the moment. If you need any more info drop me a line. Cheers, Adrian
Date: Fri, 26 Mar 2004 11:46:31 -0600
From: Andy Lester <andy [...] petdance.com>
To: david [...] kineticode.com
CC: bug-Plucene [...] rt.cpan.org, STRYTOAST [...] cpan.org
Subject: Plucene runs afoul of limited file handles on OS X
David and I have both been seeing this test failure for the last few releases of Plucene: On Fri, Mar 26, 2004 at 09:38:17AM -0800, david@kineticode.com (david@kineticode.com) wrote: Show quoted text
> This distribution has been tested as part of the cpan-testers > effort to test as many new uploads to CPAN as possible. See > http://testers.cpan.org/ > > Please cc any replies to cpan-testers@perl.org to keep other > test volunteers informed and to prevent any duplicate effort. > > -- > This is an error report generated automatically by CPANPLUS, > version 0.049. > > Below is the error stack during 'make test': > > t/analyzers..........ok > t/datesearch.........ok > t/dateserializer.....ok > t/deletable..........ok > t/indexsearcher......ok > t/mergefactor........ok > t/queryparser........ok > t/regress-01.........ok > t/regress-02.........ok > t/regress-04.........ok > t/search_hits........ok > t/searchtest.........ok > t/segments...........ok > t/similarity.........ok > t/sloppy_scorer......ok > t/stress.............# Indexing the entire Odyssey. This may take some time > # t/data/book1 > # t/data/book10 > # t/data/book11 > # t/data/book12 > # t/data/book13 > # t/data/book14 > # t/data/book15 > # t/data/book16 > # t/data/book17 > # t/data/book18 > # t/data/book19 > # t/data/book2 > # t/data/book20 > # t/data/book21 > # t/data/book22 > # t/data/book23 > # t/data/book24 > # t/data/book3 > # t/data/book4 > # t/data/book5 > # t/data/book6 > # t/data/book7 > # t/data/book8 > # t/data/book9 > # t/data/preface > # Closing > Plucene::Store::InputStream cannot open t/homer_index/10838/_10.f1 for reading: Too many open files at /Users/david/.cpanplus/5.8.3/build/Plucene-1.14/blib/lib/Plucene/Store/InputStream.pm line 37. > # Looks like your test died just after 48. > dubious > Test returned status 255 (wstat 65280, 0xff00) > after all the subtests completed successfully > t/terminfostest......ok > t/testbitvector......ok > t/testindexwriter....ok > t/tokenfilter........ok > t/tokenizer..........ok > t/utf8...............ok > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------------- > t/stress.t 255 65280 48 0 0.00% ?? > Failed 1/22 test scripts, 95.45% okay. 0/6002 subtests failed, 100.00% okay. > > > Additional comments: > > Is this because of some setting on my Mac? Should Plucene be able to keep > track of this somehow? > -- > > Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration: > Platform: > osname=darwin, osvers=7.2.0, archname=darwin-2level > uname='darwin geertz.kineticode.com 7.2.0 darwin kernel version 7.2.0: thu dec 11 16:20:23 pst 2003; root:xnuxnu-517.3.7.obj~1release_ppc power macintosh powerpc ' > config_args='-des -Dperladmin=david@kineticode.com -Dcf_email=david@kineticode.com' > hint=recommended, useposix=true, d_sigaction=define > usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef > useperlio=define d_sfio=undef uselargefiles=define usesocks=undef > use64bitint=undef use64bitall=undef uselongdouble=undef > usemymalloc=n, bincompat5005=undef > Compiler: > cc='cc', ccflags ='-pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing', > optimize='-Os', > cppflags='-no-cpp-precomp -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing' > ccversion='', gccversion='3.1 20021003 (prerelease)', gccosandvers='' > intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 > ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 > alignbytes=8, prototype=define > Linker and Libraries: > ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -L/usr/local/lib' > libpth=/usr/local/lib /usr/lib > libs=-ldbm -ldl -lm -lc > perllibs=-ldl -lm -lc > libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a > gnulibc_version='' > Dynamic Linking: > dlsrc=dl_dyld.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' > cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib'
-- Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
[PETDANCE - Fri Mar 26 12:46:41 2004]: Show quoted text
> David and I have both been seeing this test failure for the last few > releases of Plucene:
A few people have reported this. It will be investigated RNS :) Show quoted text
> > > > > > Additional comments: > > > > Is this because of some setting on my Mac? Should Plucene be able to
> keep
> > track of this somehow?
It probably is, but I don't think anyone should *have* to fiddle with machine setting just to make this work. Plucene should keep track of this, if indeed that is the problem. (I have some other ideas, which hopefully won't be deadends!)
Try the patch at http://rt.cpan.org/NoAuth/Bug.html?id=6453 to see if that fixes it
From: andy [...] petdance.com
Version 1.18 does not fix this problem. t/stress.............# Indexing the entire Odyssey. This may take some time # t/data/book1 # t/data/book10 # t/data/book11 # t/data/book12 # t/data/book13 # t/data/book14 # t/data/book15 # t/data/book16 # t/data/book17 # t/data/book18 # t/data/book19 # t/data/book2 # t/data/book20 # t/data/book21 # t/data/book22 # t/data/book23 # t/data/book24 # t/data/book3 # t/data/book4 # t/data/book5 # t/data/book6 # t/data/book7 # t/data/book8 # t/data/book9 # t/data/preface # Closing t/stress.............ok 43/0Plucene::Store::InputStream cannot open t/homer_index/22793/_21.fdx for reading: Too many open files at /usr/src/Plucene-1.18/blib/lib/Plucene/Store/InputStream.pm line 37. # Looks like your test died just after 48.
[PETDANCE - Tue Jul 20 10:49:14 2004]: Show quoted text
> Version 1.18 does not fix this problem. >
Have you tried making the change mentioned in http://rt.cpan.org/NoAuth/Bug.html?id=6453 ? Version 1.19 doesn't apply this fix either
From: torben-spam-cpan [...] nehmer.net
[guest - Thu Jul 29 05:41:22 2004]: Show quoted text
> [PETDANCE - Tue Jul 20 10:49:14 2004]: >
> > Version 1.18 does not fix this problem. > >
> > Have you tried making the change mentioned in > http://rt.cpan.org/NoAuth/Bug.html?id=6453 ? > > Version 1.19 doesn't apply this fix either
I too have this problem with 1.20 on Debian Linux, the fix mentioned above does not help me. If you need any example code, please contact me. Torben
From: torben-spam-cpan [...] nehmer.net
Hi again, [guest - Tue Feb 1 06:18:54 2005]: Show quoted text
> > Have you tried making the change mentioned in > > http://rt.cpan.org/NoAuth/Bug.html?id=6453 ? > > > > Version 1.19 doesn't apply this fix either
> > I too have this problem with 1.20 on Debian Linux, the fix mentioned > above does not help me.
By coincidence, I noted, that this is probably related to an IO::File problem. Its documentation states, that upon destruction any open handle is automatically closed. As a matter of fact this is not the case on my installation here. undef'ing does keep the file handle open, which is why Plucene runs into the limit. The only workaround I see right now is explicitly closing all filehandles manually, unless someone realizes why IO::File does not do it.
From: torben-spam-cpan [...] nehmer.net
One more note, from the IO::File class: # There is no need for DESTROY to do anything, because when the # last reference to an IO object is gone, Perl automatically # closes its associated files (if any). So, if I get this right, as long the associated IO handle is not jet out of scope, the file isn't closed. As a matter of fact, this seems to be the problem: I tried with this code: Show quoted text
> for ($i = 0; $i < 50000; $i++) > { > my $in = IO::File->new('query.xml', "r"); > my $out = IO::File->new('/dev/null', "w"); > > # my $proc = Midcom::Plucene::RequestProcessor->new($in, $out); > # $proc->Process(); > }
When you leave the two commented out segments alone, everything works fine. If not, the references get lost somehow. The RequestProcessor class releases all Plucene handles upon completion, but when looking into /proc after the too many files open error, I see that not only the query.xml and /dev/null files are open multiple times, but also the files from the plucene index. I admit, that I might have errors in my code, I'd appreciate any suggestions. The tarball attached to this comment holds the code I'm using at this time that triggers this error. Try executing ./bench.pl, you need LibXML and XML::Writer installed.
Download test.tar.gz
application/x-gzip 28.7k

Message body not shown because it is not plain text.

From: torben-spam-cpan [...] nehmer.net
Hi, one more tarball, this time a bit more optimized. Thbench script now only produces multiple handles on the plucene index. Torben
Download test.tar.gz
application/x-gzip 29.8k

Message body not shown because it is not plain text.

From: torben-spam-cpan [...] nehmer.net
Hi, digged into this a few more times, now tracking the open filehandels through each iteration. What I found out is interesting, after around 100 iterations my application had these file handels open: 1 _8.f1 1 _8.f3 1 _8.f4 1 _8.f5 1 _8.f6 1 _8.f7 1 _8.fdt 1 _8.fdx 237 _8.frq 1 _8.prx 1 _8.tis Any ideas? Torben
Hi, Show quoted text
> 237 _8.frq
As it seems the Handle to this file gets duplicated. When tracking SegmentTermDocs new calls, I discovered that with each iteration four of these objects were created, while only two were actually destroyed, which of course leaves two handels open. As a proof for this, I changed the construcotr of termdocs so that the stream is no longer cloned but used directly, just for proof of concept (I know that this could break other things, yes). When this was done, I definitly had no more dangling file handels. Now the main question, for which I won't have time today, is where these TermDocs get lost. Perhaps some cyclic references that prevent the GC from disposing the objects are the reason for this. As a matter of fact, these dangling objects are causing a memory leak: Script start : 9,0 MB VM size After 3000 iterations: 34,5 MB VM size
From: torben-spam-cpan [...] nehmer.net
Hi, Show quoted text
> 237 _8.frq
As it seems the Handle to this file gets duplicated. When tracking SegmentTermDocs new calls, I discovered that with each iteration four of these objects were created, while only two were actually destroyed, which of course leaves two handels open. As a proof for this, I changed the construcotr of termdocs so that the stream is no longer cloned but used directly, just for proof of concept (I know that this could break other things, yes). When this was done, I definitly had no more dangling file handels. Now the main question, for which I won't have time today, is where these TermDocs get lost. Perhaps some cyclic references that prevent the GC from disposing the objects are the reason for this. As a matter of fact, these dangling objects are causing a memory leak: Script start : 9,0 MB VM size After 3000 iterations: 34,5 MB VM size
[guest - Wed Feb 2 03:27:29 2005]: Show quoted text
> As a proof for this, I changed the construcotr of termdocs so that the > stream is no longer cloned but used directly, just for proof of concept > (I know that this could break other things, yes). When this was done, I > definitly had no more dangling file handels.
Is there any kind of useful workaround? I see this hasn't been updated for 3 months... I've got a plucene database with less than 500 records in it that's bombing out on my linux box because of filehandle issues.