Subject: | KinoSearch feature suggestions |
Date: | Mon, 21 Jan 2008 14:16:58 -0800 |
To: | bug-kinosearch [...] rt.cpan.org, marvin [...] rectangular.com |
From: | Father Chrysostomos <sprout [...] cpan.org> |
Hello.
I’d like to request that a few features be added to KinoSearch. I need
these features myself, so I’m willing to contribute patches. Please
let me know what you think.
Father Chrysostomos
__DATA__
1. Wildcards in search queries
2. I’d like KinoSearch::Highlight::Highlighter to be able to create
non-contiguous excerpts (which I’m calling ‘summaries’; the contiguous
sub-parts of each summary I’m calling excerpts):
$highlighter->add_spec( excerpt_length => 50, summary_length =>
200, ...);
The highlighter would find the most important word to highlight (as it
currently does), and create a 50-char excerpt. Then it would create an
excerpt for the second most important word and add that (removing
overlap if necessary), repeating this process until the summary is the
right length.
3. Custom ellipsis marks:
$highlighter->add_spec( ellipsis_mark => "\x{2026}", ... )
4. Pagination (another highlighter feature): An index field could be
designated as the ‘page offset’ field, containing byte offsets of page
breaks.
$highlighter->add_spec(
page_offset_field => 'pageoffsets',
page_offset_formatter => $object,
);
And $object would have to have a page_label method: sub page_label
{ my ($self, $fields_hashref, $page_no) = @_; ... }
Though it might be more complicated, maybe we could have page breaks
(chr 12) recorded automatically when the index is created. Then
‘page_offset_field’ won’t be necessary.
For examples of 2 and 4 in use, see <http://synodinresistance.org/cgi-bin/anazetesis?all=1&and-glossa=&and-morphe=&g=en&q=thing
Show quoted text
> (which I’d like to switch to using KinoSearch, because it’s
currently too slow).