Subject: | KinoSearch 0.15 crash bug + weird feature req |
Date: | Tue, 13 Mar 2007 00:34:52 -0400 (EDT) |
To: | bug-KinoSearch [...] rt.cpan.org |
From: | Mike Andrews <mandrews [...] bit0.com> |
Platform is FreeBSD 6.2-RELEASE, both i386 and amd64 versions.
Using a fairly standard KinoSearch 0.15 setup (mostly boilerplate code)
entering a URL as a search term causes Perl, and thus mod_perl and its
Apache parent, to SIGSEGV. I'm guessing it's trying to add a field named
'http' to the search terms, and I don't have one by that name, but it's
weird because entering other nonexistent field names just makes it return
0 results -- as it should.
Just before it crashes, I get this:
Undefined subroutine &KinoSearch::Search::PhraseScorer::kerror called at /usr/local/lib/perl5/site_perl/5.8.8/mach/KinoSearch/Search/PhraseScorer.pm line 21.
Feeding the core file to gdb says this (on amd64):
#0 0x0000000801c22bcd in Kino_PhraseScorer_destroy () from /usr/local/lib/perl5/site_perl/5.8.8/mach/auto/KinoSearch/KinoSearch.so
#1 0x0000000801c1eae7 in XS_KinoSearch__Search__PhraseScorer_DESTROY ()
from /usr/local/lib/perl5/site_perl/5.8.8/mach/auto/KinoSearch/KinoSearch.so
#2 0x00000008006bd11c in Perl_pp_entersub () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#3 0x00000008006620d7 in S_call_body () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#4 0x0000000800666c1c in Perl_call_sv () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#5 0x00000008006bfab5 in Perl_sv_clear () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#6 0x00000008006c0161 in Perl_sv_free () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#7 0x00000008006df2a7 in Perl_leave_scope () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#8 0x0000000800663689 in S_my_exit_jump () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#9 0x0000000800668c1b in Perl_my_failure_exit () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#10 0x00000008006e16e1 in Perl_die_where () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#11 0x00000008006a80af in Perl_vdie () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#12 0x00000008006a81cd in Perl_die () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#13 0x00000008006bd7df in Perl_pp_entersub () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#14 0x00000008006b5dbe in Perl_runops_standard () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#15 0x00000008006675f2 in perl_run () from /usr/local/lib/perl5/5.8.8/mach/CORE/libperl.so
#16 0x000000000040156f in main ()
I haven't tried 0.20_2 to see if it's fixed there.
Now for the feature request, and this is probably a bit out there because
I don't know that a lot of people other than me would use this, but, would
it be possible to separate out the highlighter from the excerpter? For my
application I want to highlight all the search terms in a field but not
actually do any excerpting of it at all. Setting the excerpt_size to a
huge value keeps the full string there but still does some punctuation
mangling at the end, adds ellipsis if there's no full-stop, etc.
Here's why I want a weird un-excerpted highlight: the search app I'm
writing is searching just news headlines without the articles. There are
some other fields that can be searched to narrow results down (a topic, a
date, a partial URL) but basically each "document" is under 300 bytes, has
already had HTML entities normalized, etc. So there's not much point in
doing an excerpt of something that's already that short. And since news
headlines tend to not have trailing punctuation, the full-stop check
throws ellipses on the end by default...
I can see where some people might want un-highlighted excerpts too, such
as command-line searches that don't use HTML (or curses).
On the other hand, I might just be weird for having a search engine for
short strings :)
For now, I made a custom slimmed-down version of generate_excerpt() by
subclassing KinoSearch::Highlight::Highlighter and that works, and it
looks like it'll work on 0.20 also (even though I know a lot of other
stuff will break in the 0.15 to 0.20 conversion -- which is fine --
looking forward to trying the secondary sort feature there). So short
term I have a workable solution. I also hacked around the crash bug by
just throwing out search terms that start with \w+: that don't match one
of my known fields.