CC: | KinoSearch discussion forum <kinosearch [...] rectangular.com> |
Subject: | Possible Phrase Query Bug |
Date: | Fri, 7 Sep 2007 14:37:37 -0700 |
To: | bug-kinosearch [...] rt.cpan.org |
From: | Matthew O'Connor <matthew.oconnor [...] socialtext.com> |
I think I've found a bug with phrase queries. (The RT queue did not
look too active so I am mailing it to the list as well).
Here's the reproduction strategy:
1) Create a PolyAnalyzer
2) Create an indexer w/ the above analyzer
3) Add a document w/ a text field of 'zzz xxx yyy zzz'
4) finish() the index
5) create a searcher w/ the above analyzer
6) Do the following phrase search: "yyy zzz"
7) EXPECTED: 1 result. GOT: 0 results
I have attached a script which demonstrates this issue with
KinoSearch 0.15. I think the first "zzz" in the body of the text
field is tripping up the phrase scoring, but that's just a guess.
Is this behavior expected? If not, is it a known bug? I looked in
RT and searched the mailing lists but nothing obviously related came
back.
Here is my environment:
* OS: Ubuntu Dapper 6.01 GNU/Linux
* CPU: 2.0Ghz quad-core Intel Xeon (64 bit)
* Kernel: 2.6.15-28 (Debian package 2.6.15-28-amd64-xeon)
* Index Filesystem: ext3 (part of a logical volume in LVM)
* /tmp Filesystem: ext3
* KinoSearch: 0.15 (unpatched, from CPAN)
* Perl: 5.8.7 (stock Perl on Ubuntu Dapper 6.01)
* gcc: 4.03
* ld: 2.16.91
-matthew
Message body is not shown because sender requested not to inline it.