Skip Menu |

This queue is for tickets about the DBD-SQLite CPAN distribution.

Report information
The Basics
Id: 54628
Status: resolved
Priority: 0/
Queue: DBD-SQLite

People
Owner: Nobody in particular
Requestors: avar [...] cpan.org
Cc: hinrik [...] cpan.org
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.29
Fixed in: (no value)



CC: hinrik [...] cpan.org
Subject: Enable Custom (User Implemented) Tokenizers with FTS3 tables
We (Hinrik and I) have an application (Hailo on CPAN) that could use SQLite FTS3 tables. The problem is that the default tokenizer SQLite provides is too naïve for the sort of text we're processing. I.e. it's ASCII-only. SQLite supports custom tokenizers by creating a C function and then passing a pointer to that function as a BLOB via fts3_tokenizer(): http://www.sqlite.org/fts3.html#section_5_1 SQLite's default tokenizer is defined in its fts3_tokenizer1.c. I haven't tested it yet but it should be possible to do this with the current DBD::SQLite interface by creating an XS module which includes the sqlite headers and creates a sqlite3_tokenizer_module and returns a pointer to its struct to Perl as a IV, then that could be passed to DBD::SQLite by converting the IV to a SQLite BLOB: http://search.cpan.org/~adamk/DBD-SQLite- 1.29/lib/DBD/SQLite.pm#Blobs But it would be much simpler if DBD::SQLite did all the hard lifting so you could simply pass Perl subroutine callbacks similar to how 'sqlite_create_function' works now. Have the maintainers looked into this and perhaps have some idea about how best to do this?
Hi. It'd be nice if you send us a patch, but I think this should rather be an extension like DBD::SQLite::Extension::CustomFTS3Tokenizer, as the sqlite3 documentation says as follows (i.e. you can do it by yourself, without changing the internal of sqlite3/DBD::SQLite): Show quoted text
> FTS3 does not expose a C-function that users call to register new
tokenizer types with a database handle. Instead, the pointer must be encoded as an SQL blob value and passed to FTS3 through the SQL engine by evaluating a special scalar function, "fts3_tokenizer()" - Kenichi On 2010-2-15 Mon 11:26:53, AVAR wrote: Show quoted text
> We (Hinrik and I) have an application (Hailo on CPAN) that could use > SQLite FTS3 tables. The problem is that the default tokenizer SQLite > provides is too naïve for the sort of text we're processing. I.e.
it's Show quoted text
> ASCII-only. > > SQLite supports custom tokenizers by creating a C function and then > passing a pointer to that function as a BLOB via fts3_tokenizer(): > > http://www.sqlite.org/fts3.html#section_5_1 > > SQLite's default tokenizer is defined in its fts3_tokenizer1.c. > > I haven't tested it yet but it should be possible to do this with the > current DBD::SQLite interface by creating an XS module which includes > the sqlite headers and creates a sqlite3_tokenizer_module and returns
a Show quoted text
> pointer to its struct to Perl as a IV, then that could be passed to > DBD::SQLite by converting the IV to a SQLite BLOB: > > http://search.cpan.org/~adamk/DBD-SQLite- > 1.29/lib/DBD/SQLite.pm#Blobs > > But it would be much simpler if DBD::SQLite did all the hard lifting
so Show quoted text
> you could simply pass Perl subroutine callbacks similar to how > 'sqlite_create_function' works now. > > Have the maintainers looked into this and perhaps have some idea
about Show quoted text
> how best to do this?
CC: hinrik [...] cpan.org
Subject: Re: [rt.cpan.org #54628] Enable Custom (User Implemented) Tokenizers with FTS3 tables
Date: Mon, 15 Feb 2010 21:02:26 +0000
To: bug-DBD-SQLite [...] rt.cpan.org
From: Ævar Arnfjörð Bjarmason <avar [...] cpan.org>
On Mon, Feb 15, 2010 at 20:29, Kenichi Ishigaki via RT <bug-DBD-SQLite@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=54628 > > > Hi. It'd be nice if you send us a patch, but I think this should rather > be an extension like DBD::SQLite::Extension::CustomFTS3Tokenizer, as > the sqlite3 documentation says as follows (i.e. you can do it by > yourself, without changing the internal of sqlite3/DBD::SQLite): >
>> FTS3 does not expose a C-function that users call to register new
> tokenizer types with a database handle. Instead, the pointer must be > encoded as an SQL blob value and passed to FTS3 through the SQL engine > by evaluating a special scalar function, "fts3_tokenizer()"
I've already implemented this for my use. Although just with a pure-C callback and not something that can call into Perl space, although that would be easy: http://github.com/hinrik/hailo/blob/master/lib/Hailo/Storage/DBD/SQLite/Tokenizer.pm Then I use it like this: sub inject_tokenizer { my ($self) = @_; my $ptr = Hailo::Storage::DBD::SQLite::Tokenizer::get_tokenizer_ptr(); # HACK. Doing this because using '?' and $sth->bind_param(2, $ptr, # SQL_BLOB); ends up passing nothing to # sqlite. I.e. sqlite3_value_bytes(argv[1]); will be 0 my $pptr = pack "P", $ptr; my $sth = $self->dbh->prepare("SELECT fts3_tokenizer(?, '$pptr')"); $sth->bind_param(1, "Hailo_tokenizer"); $sth->execute(); } That /should/ work but we haven't done the part which actually uses FTS3 for anything yet. The problem with this being outside DBD::SQLite however is that you have to include sqlite3.h. I'm doing that from the filesystem currently which is obviously prone to breakage in case DBD::SQLite's version doesn't match the one on my system, and that also makes it harder to install the extension.
CC: hinrik [...] cpan.org
Subject: Re: [rt.cpan.org #54628] Enable Custom (User Implemented) Tokenizers with FTS3 tables
Date: Sun, 28 Feb 2010 06:34:15 +0000
To: bug-DBD-SQLite [...] rt.cpan.org
From: Ævar Arnfjörð Bjarmason <avar [...] cpan.org>
FWIW we got this to work, but ended up not using FTS3 for other reasons. Anyone interested in how to do this can look at the Git history for Hailo. I think if someone implements this properly it needs to be in DBD::SQLite core, I can't see how else you could sanely guarantee that your extension and DBD::SQLite use the same SQLite version.
FYI, DBD::SQLite 1.30_01 starts to put sqlite3.[ch] into a distribution share directory accessible via File::ShareDir::dist_dir('DBD-SQLite'). I don't think this is enough for you, but it would help you a bit. I keep this ticket open so we can do more work on this. On 2010-2-28 Sun 01:34:41, AVAR wrote: Show quoted text
> FWIW we got this to work, but ended up not using FTS3 for other > reasons. Anyone interested in how to do this can look at the Git > history for Hailo. > > I think if someone implements this properly it needs to be in > DBD::SQLite core, I can't see how else you could sanely guarantee that > your extension and DBD::SQLite use the same SQLite version.
Hi. For your information, custom FTS3 tokenizer support has been added to the DBD::SQLite core since version 1.30_04 (thanks to Laurent Dami). I know you have ended up not using FTS3 for your Hailo, but it'd be great if you kindly test the latest DBD::SQLite when you have some spare time. Thanks. Kenichi On 2010-2-28 Sun 01:34:41, AVAR wrote: Show quoted text
> FWIW we got this to work, but ended up not using FTS3 for other > reasons. Anyone interested in how to do this can look at the Git > history for Hailo. > > I think if someone implements this properly it needs to be in > DBD::SQLite core, I can't see how else you could sanely guarantee that > your extension and DBD::SQLite use the same SQLite version.