CC: | hinrik [...] cpan.org |
Subject: | Enable Custom (User Implemented) Tokenizers with FTS3 tables |
We (Hinrik and I) have an application (Hailo on CPAN) that could use
SQLite FTS3 tables. The problem is that the default tokenizer SQLite
provides is too naïve for the sort of text we're processing. I.e. it's
ASCII-only.
SQLite supports custom tokenizers by creating a C function and then
passing a pointer to that function as a BLOB via fts3_tokenizer():
http://www.sqlite.org/fts3.html#section_5_1
SQLite's default tokenizer is defined in its fts3_tokenizer1.c.
I haven't tested it yet but it should be possible to do this with the
current DBD::SQLite interface by creating an XS module which includes
the sqlite headers and creates a sqlite3_tokenizer_module and returns a
pointer to its struct to Perl as a IV, then that could be passed to
DBD::SQLite by converting the IV to a SQLite BLOB:
http://search.cpan.org/~adamk/DBD-SQLite-
1.29/lib/DBD/SQLite.pm#Blobs
But it would be much simpler if DBD::SQLite did all the hard lifting so
you could simply pass Perl subroutine callbacks similar to how
'sqlite_create_function' works now.
Have the maintainers looked into this and perhaps have some idea about
how best to do this?