Skip Menu |

This queue is for tickets about the Sphinx-Search CPAN distribution.

Report information
The Basics
Id: 42850
Status: resolved
Priority: 0/
Queue: Sphinx-Search

People
Owner: Nobody in particular
Requestors: sharifulin [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: 0.21



Subject: BuildExcerpts and UTF-8
Hi! I use BuildExcerpts and UTF-8. And result of this command is twice UTF-8 encode. All data and scripts are UTF-8. Solve: 1. For $words: Encode::_utf8_off($words) 2. For results of BuildExcerpts: [grep { Encode::_utf8_on($_);1 } @{$sph->BuildExcerpts(...)}]; Please, check and fix it. Thanks.
Subject: Re: [rt.cpan.org #42850] BuildExcerpts and UTF-8
Date: Fri, 06 Feb 2009 13:18:47 +1030
To: bug-Sphinx-Search [...] rt.cpan.org
From: Jon Schutz <jon [...] jschutz.net>
Hello Anatoly, Sphinx::Search doesn't know what character set the searchd server is using. That is set in sphinx.conf and the perl module passes back whatever it receives from searchd. You will need to run decode_utf8() over the returned bytes to convert to perl UTF8 strings. If you're finding the need to use _utf8_on(), then there's probably something more fundamental going wrong with your UTF8 handling. Do you have the following in your 'source' spec? sql_query_pre = SET NAMES utf8 And in your 'index' spec: charset_type = utf-8 You will need to re-index if you change either of these. Do your MySQL tables use UTF8 settings? If you can provide a failing test case, I'm happy to look at it further, but as it doesn't appear to be a Sphinx::Search issue, I'm going to resolve this bug report for now. Regards, -- Jon Schutz My tech notes http://notes.jschutz.net Chief Technology Officer http://www.youramigo.com YourAmigo On Wed, 2009-01-28 at 03:46 -0500, Anatoly K. Sharifulin via RT wrote: Show quoted text
> Wed Jan 28 03:46:40 2009: Request 42850 was acted upon. > Transaction: Ticket created by sharifulin > Queue: Sphinx-Search > Subject: BuildExcerpts and UTF-8 > Broken in: (no value) > Severity: Important > Owner: Nobody > Requestors: tollik@mail.ru > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=42850 > > > > Hi! > > I use BuildExcerpts and UTF-8. And result of this command is twice UTF-8 > encode. All data and scripts are UTF-8. > Solve: > 1. For $words: Encode::_utf8_off($words) > 2. For results of BuildExcerpts: [grep { Encode::_utf8_on($_);1 } > @{$sph->BuildExcerpts(...)}]; > > Please, check and fix it. Thanks. >
On Thu Feb 05 21:50:30 2009, jon@jschutz.net wrote: Show quoted text
> Hello Anatoly, > > Sphinx::Search doesn't know what character set the searchd server is > using. That is set in sphinx.conf and the perl module passes back > whatever it receives from searchd. You will need to run decode_utf8() > over the returned bytes to convert to perl UTF8 strings. >
Have revisited this, and added support for string encoder/decoder in Sphinx::Search. Default is encode_utf8/decode_utf8, so this case with BuildExcerpts should now work. If an alternative character set is specified in the sphinx configuration file, different encodes/decoders need to be specified. -- Jon Schutz My tech notes http://notes.jschutz.net Chief Technology Officer http://www.youramigo.com YourAmigo