Hello Anatoly,
Sphinx::Search doesn't know what character set the searchd server is
using. That is set in sphinx.conf and the perl module passes back
whatever it receives from searchd. You will need to run decode_utf8()
over the returned bytes to convert to perl UTF8 strings.
If you're finding the need to use _utf8_on(), then there's probably
something more fundamental going wrong with your UTF8 handling.
Do you have the following in your 'source' spec?
sql_query_pre = SET NAMES utf8
And in your 'index' spec:
charset_type = utf-8
You will need to re-index if you change either of these.
Do your MySQL tables use UTF8 settings?
If you can provide a failing test case, I'm happy to look at it further,
but as it doesn't appear to be a Sphinx::Search issue, I'm going to
resolve this bug report for now.
Regards,
--
Jon Schutz My tech notes
http://notes.jschutz.net
Chief Technology Officer
http://www.youramigo.com
YourAmigo
On Wed, 2009-01-28 at 03:46 -0500, Anatoly K. Sharifulin via RT wrote:
Show quoted text> Wed Jan 28 03:46:40 2009: Request 42850 was acted upon.
> Transaction: Ticket created by sharifulin
> Queue: Sphinx-Search
> Subject: BuildExcerpts and UTF-8
> Broken in: (no value)
> Severity: Important
> Owner: Nobody
> Requestors: tollik@mail.ru
> Status: new
> Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=42850 >
>
>
> Hi!
>
> I use BuildExcerpts and UTF-8. And result of this command is twice UTF-8
> encode. All data and scripts are UTF-8.
> Solve:
> 1. For $words: Encode::_utf8_off($words)
> 2. For results of BuildExcerpts: [grep { Encode::_utf8_on($_);1 }
> @{$sph->BuildExcerpts(...)}];
>
> Please, check and fix it. Thanks.
>