Skip Menu |

This queue is for tickets about the Search-Tools CPAN distribution.

Report information
The Basics
Id: 83771
Status: resolved
Priority: 0/
Queue: Search-Tools

People
Owner: karman [...] cpan.org
Requestors: blinov.stanislav [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: HiLiter problem with case
Date: Tue, 5 Mar 2013 21:21:22 +0400
To: bug-Search-Tools [...] rt.cpan.org
From: Stanislav Blinov <blinov.stanislav [...] gmail.com>
It seems that Search::Tools::HiLiter doesn't highlights a word when it begins with capital letter (I haven't tested other occasions). This is true at least for Russian, but I think for other languages too. As example, a user can search for a word "house" and it will highlight occurences of "house", but will not higlight "White House". Can you please check this?
The default behavior is case insensitive. Here's a one-liner to demonstrate: perl -e 'use Search::Tools; my $h = Search::Tools->hiliter(query => "house"); my $text = "White House"; print $h->light($text), $/;' Can you provide a failing test case?
Subject: Re: [rt.cpan.org #83771] HiLiter problem with case
Date: Thu, 7 Mar 2013 21:47:50 +0400
To: bug-Search-Tools [...] rt.cpan.org
From: Stanislav Blinov <blinov.stanislav [...] gmail.com>
I have figured out the problem. It comes when I pass Search::Tools::QueryParser to HiLiter without quotes. Example of code that doesn't works: my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc); my $search_tools_parsed_query = $qparser->parse($query); my $hiliter = Search::Tools::HiLiter->new( query => $search_tools_parsed_query ); However, if I use: query => "$search_tools_parsed_query" - it works. Don't know why. Also, if I use Russian charactets, the script outputs in console messages like: no entity defined for >и< ! Not really a bug, but annoying and useless.
On Thu Mar 07 12:48:23 2013, blinov.stanislav@gmail.com wrote: Show quoted text
> I have figured out the problem. It comes when I pass > Search::Tools::QueryParser to HiLiter without quotes. > > Example of code that doesn't works: > > my $qparser = Search::Tools::QueryParser->new(stemmer => > \&mystemfunc); > my $search_tools_parsed_query = $qparser->parse($query); > > my $hiliter = Search::Tools::HiLiter->new( > query => $search_tools_parsed_query > ); > > However, if I use: query => "$search_tools_parsed_query" - it works. > > Don't know why.
I've added a test case for this in https://github.com/karpet/search-tools- perl/commit/7ba950d9ebc6d27cbba269d567db8b70abc3b653 but it is passing for me. If you could fork search-tools on github and expand that test till it fails, I can try and fix it. Show quoted text
> Also, if I use Russian charactets, the script outputs in console > messages like: > > no entity defined for >и< ! > > Not really a bug, but annoying and useless.
fixed in https://github.com/karpet/search-tools- perl/commit/6e2e03adb6c106fb4f86dd56fca1044a270bb2ec
Subject: Re: [rt.cpan.org #83771] HiLiter problem with case
Date: Sun, 17 Mar 2013 14:04:00 +0400
To: bug-Search-Tools [...] rt.cpan.org
From: Stanislav Blinov <blinov.stanislav [...] gmail.com>
Hi, I have finally wrote a script that clearly show a source of the bug. It appears only when I use Xapian stemmer (I think you have Search::Xapian installed). === use Search::Tools::Snipper; use Search::Tools::HiLiter; use Search::Tools::QueryParser; use Search::Xapian; use utf8; my $query = "ИППО"; my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc); my $search_tools_parsed_query = $qparser->parse($query); my $h = Search::Tools->hiliter(query => $search_tools_parsed_query); my $s = Search::Tools->snipper( occur => 3, context => 8, max_chars => 300, query => $search_tools_parsed_query, strip_markup => 1, type => 'token'); my $text = " Первая Конференция ИППО об итогах, приоритетах и перспективах 10 июня 2010 года в Москве "; print $h->light($s->snip( $text )); sub mystemfunc { my ($parser, $word) = @_; my $stemmer = Search::Xapian::Stem->new('russian'); my $new_word = $stemmer->stem_word($word); return $stemmer->stem_word($word); } === When I comment out use utf8; highliter works, otherwise no. The solution I have found is to change last lines of mystemfunc: use Search::Tools::UTF8; return to_utf8($stemmer->stem_word($word)); I think it is more a Xapian bug then yours, but anyway I decided to demonstrate it to make things clear. Regards.
On Sun Mar 17 06:04:29 2013, blinov.stanislav@gmail.com wrote: Show quoted text
> Hi, > > I have finally wrote a script that clearly show a source of the bug. > It appears only when I use Xapian stemmer (I think you have > Search::Xapian installed).
thanks. that's clear. I have committed https://github.com/karpet/search-tools- perl/commit/319d8cc70c1a36bceb7a7988415bbf0bec8735e4 to address this issue, which would effect any stemmer that did not return utf-8 bytes instead of utf-8 characters. Please verify that the latest git master branch fixes your problem, and I will release 0.92. cheers.
On Mon Mar 18 09:39:11 2013, KARMAN wrote: Show quoted text
> > Please verify that the latest git master branch fixes your problem, > and I will release 0.92. >
I have uploaded 0.92 and 0.93 to CPAN to fix other issues, and included this change. Please re-open if your issue isn't addressed by 0.93.