Subject: | Issue with Sphinx-Search-0.240.1 |
Date: | Mon, 21 Feb 2011 23:11:07 -0500 |
To: | <bug-Sphinx-Search [...] rt.cpan.org> |
From: | "Schultz, Len" <len [...] winequest.com> |
Hello,
I am just getting started with Sphinx, and have run into an issue. I'm setting up charset_tables with a charset_type = sbcs. Then I'm searching for a word with umlauts, specifically "Spätlese". This works just fine on the Sphinx command line. But Sphinx::Search is converting the search term to UTF8, then it is tokenizing it to "Sp" and "tlese".
I have the following index setup to test searching for words containing "a" with umlauts,
i.e. "ä":
index big
{
source = srcbig
path = C:/Sphinx/data/big
docinfo = extern
charset_type = sbcs
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+E4->a
min_prefix_len = 3
}
I can use the search command line
C:\Sphinx>bin\search -e spatlese
C:\Sphinx>bin\search -e spätlese
Both work as expected and return the same results.
But when I search from Perl, I get results only when the search word is "spatlese" and
not when the search word is "spätlese"
$results = $sph->SetMatchMode(SPH_MATCH_ALL)
->SetSortMode(SPH_SORT_RELEVANCE)
->Query("spatlese");
This works
$results = $sph->SetMatchMode(SPH_MATCH_ALL)
->SetSortMode(SPH_SORT_RELEVANCE)
->Query("spätlese");
This does not work.
The $results returned are:
(
{
attrs => { gsifota => 1 },
error => "",
fields => ["fullstring"],
matches => [],
"time" => "0.000",
total => 0,
total_found => 0,
warnings => "",
words => {
"" => { docs => 0, hits => 0 },
sp => { docs => 172, hits => 244 },
tlese => { docs => 0, hits => 0 },
},
},
undef,
undef,
)
So the question is how to get Search::Sphinx to pass the search query as a sbcs and not convert it to UTF8...
--len