Skip Menu |

This queue is for tickets about the Sphinx-Search CPAN distribution.

Report information
The Basics
Id: 66018
Status: resolved
Priority: 0/
Queue: Sphinx-Search

People
Owner: Nobody in particular
Requestors: len [...] winequest.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Issue with Sphinx-Search-0.240.1
Date: Mon, 21 Feb 2011 23:11:07 -0500
To: <bug-Sphinx-Search [...] rt.cpan.org>
From: "Schultz, Len" <len [...] winequest.com>
Hello, I am just getting started with Sphinx, and have run into an issue. I'm setting up charset_tables with a charset_type = sbcs. Then I'm searching for a word with umlauts, specifically "Spätlese". This works just fine on the Sphinx command line. But Sphinx::Search is converting the search term to UTF8, then it is tokenizing it to "Sp" and "tlese". I have the following index setup to test searching for words containing "a" with umlauts, i.e. "ä": index big { source = srcbig path = C:/Sphinx/data/big docinfo = extern charset_type = sbcs charset_table = 0..9, A..Z->a..z, _, a..z, \ U+E4->a min_prefix_len = 3 } I can use the search command line C:\Sphinx>bin\search -e spatlese C:\Sphinx>bin\search -e spätlese Both work as expected and return the same results. But when I search from Perl, I get results only when the search word is "spatlese" and not when the search word is "spätlese" $results = $sph->SetMatchMode(SPH_MATCH_ALL) ->SetSortMode(SPH_SORT_RELEVANCE) ->Query("spatlese"); This works $results = $sph->SetMatchMode(SPH_MATCH_ALL) ->SetSortMode(SPH_SORT_RELEVANCE) ->Query("spätlese"); This does not work. The $results returned are: ( { attrs => { gsifota => 1 }, error => "", fields => ["fullstring"], matches => [], "time" => "0.000", total => 0, total_found => 0, warnings => "", words => { "" => { docs => 0, hits => 0 }, sp => { docs => 172, hits => 244 }, tlese => { docs => 0, hits => 0 }, }, }, undef, undef, ) So the question is how to get Search::Sphinx to pass the search query as a sbcs and not convert it to UTF8... --len
Subject: RE: [rt.cpan.org #66018] AutoReply: Issue with Sphinx-Search-0.240.1
Date: Mon, 21 Feb 2011 23:16:00 -0500
To: <bug-Sphinx-Search [...] rt.cpan.org>
From: "Schultz, Len" <len [...] winequest.com>
Nevermind. Solution found. $sph->SetEncoders( sub { shift }, sub { shift }); --len Show quoted text
-----Original Message----- From: Bugs in Sphinx-Search via RT [mailto:bug-Sphinx-Search@rt.cpan.org] Sent: Monday, February 21, 2011 8:12 PM To: Schultz, Len Subject: [rt.cpan.org #66018] AutoReply: Issue with Sphinx-Search-0.240.1 Greetings, This message has been automatically generated in response to the creation of a trouble ticket regarding: "Issue with Sphinx-Search-0.240.1", a summary of which appears below. There is no need to reply to this message right now. Your ticket has been assigned an ID of [rt.cpan.org #66018]. Your ticket is accessible on the web at: http://rt.cpan.org/Ticket/Display.html?id=66018 Please include the string: [rt.cpan.org #66018] in the subject line of all future correspondence about this issue. To do so, you may reply to this message. Thank you, bug-Sphinx-Search@rt.cpan.org ------------------------------------------------------------------------- Hello, I am just getting started with Sphinx, and have run into an issue. I'm setting up charset_tables with a charset_type = sbcs. Then I'm searching for a word with umlauts, specifically "Spätlese". This works just fine on the Sphinx command line. But Sphinx::Search is converting the search term to UTF8, then it is tokenizing it to "Sp" and "tlese". I have the following index setup to test searching for words containing "a" with umlauts, i.e. "ä": index big { source = srcbig path = C:/Sphinx/data/big docinfo = extern charset_type = sbcs charset_table = 0..9, A..Z->a..z, _, a..z, \ U+E4->a min_prefix_len = 3 } I can use the search command line C:\Sphinx>bin\search -e spatlese C:\Sphinx>bin\search -e spätlese Both work as expected and return the same results. But when I search from Perl, I get results only when the search word is "spatlese" and not when the search word is "spätlese" $results = $sph->SetMatchMode(SPH_MATCH_ALL) ->SetSortMode(SPH_SORT_RELEVANCE) ->Query("spatlese"); This works $results = $sph->SetMatchMode(SPH_MATCH_ALL) ->SetSortMode(SPH_SORT_RELEVANCE) ->Query("spätlese"); This does not work. The $results returned are: ( { attrs => { gsifota => 1 }, error => "", fields => ["fullstring"], matches => [], "time" => "0.000", total => 0, total_found => 0, warnings => "", words => { "" => { docs => 0, hits => 0 }, sp => { docs => 172, hits => 244 }, tlese => { docs => 0, hits => 0 }, }, }, undef, undef, ) So the question is how to get Search::Sphinx to pass the search query as a sbcs and not convert it to UTF8... --len
Easiest fix I've ever had to do.