Skip Menu |

This queue is for tickets about the DBD-SQLite CPAN distribution.

Report information
The Basics
Id: 53235
Status: resolved
Priority: 0/
Queue: DBD-SQLite

People
Owner: Nobody in particular
Requestors: bch-4439 [...] gmx.net
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.27
Fixed in: (no value)



Subject: Compatibility with SQLite ICU extension
Hi. This is a feature request.

SQLite can be compiled with unicode support via the ICU library:

http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt
http://site.icu-project.org/

It would be extremely helpful if DBD-SQLite were compatible with this extension. I think there is only a minor step missing. Here is what I tried.

For ICU support, you configure SQLite 3.6.20 with CFLAGS=-DSQLITE_ENABLE_ICU and LIBS=-licuio. I tried to see whether DBD-SQLite 1.27 works with the ICU extension and configured it with

perl Makefile.PL LIBS="-licuio" CCFLAGS="-DSQLITE_ENABLE_ICU" PREFIX=/usr/local

Compilation works, and the resulting Perl module knows the ICU collations, for example,

SELECT icu_load_collation('de_DE','custom');

does not give an error. But when you try to use this collation, say,

CREATE TABLE something ( column VARCHAR(20) COLLATE custom );

DBD-SQLite says it does not know the collation:

can't install, unknown collation : custom at /usr/local/lib/perl5/site_perl/5.10.0/i586-linux-thread-multi/DBD/SQLite.pm line 141.

So it seems although the ICU extension is there and the ICU collation 'de_DE' is known, DBD-SQLite takes control of all requested collations and gives an error because it does not know about the ICU collation.

I would be most grateful if you could support ICU in a future version. I am also happy to invest some time and try out patches.

Finally, why might somebody want ICU support? SQLite is only able to sort ASCII strings. Any accented characters are wrongly sorted and there is no unicode or locale support whatsoever (SQLite design choice because they want to keep their library small). Of course, every client is free to add their own collations. So you added support for Perl 'cmp'. In Qt, it is easy to use their unicode localized QString comparison, and so on. But as of today, this is done differently in each client (SQLite design drawback because there is no central server). So your database, well, at least your indices, becomes inconsistent if you access the same database file in different ways, say a Perl DBD-SQLite script to do some data import, versus a Qt frontend to query the data base. Since SQLite already have the ICU extension, that one would be the natural standard for unicode and locale support.

Thanks for your attention and your help.

Bernhard

Hi. I haven't fully investigated yet, but a quick hack to
sqlite3.c seems to work for me, at least for your sample.

--- sqlite3.c (revision 10704)
+++ sqlite3.c (working copy)
@@ -109521,7 +109521,7 @@
  }
  assert(p);

- rc = sqlite3_create_collation_v2(db, zName, SQLITE_UTF16, (void *)pUCollator,
+ rc = sqlite3_create_collation_v2(db, zName, SQLITE_UTF8, (void *)pUCollator,
  icuCollationColl, icuCollationDel
  );
  if( rc!=SQLITE_OK ){

I don't think this is the best patch (database encoding
should be taken from somewhere, and this is a patch
to the amalgamated source, which should be a patch to
ext/icu/icu.c), but anyway, if this works for you, or
at least helps you create a better patch, please
report this to the sqlite team and ask them to fix.

Kenichi

On 2009-Dec-30 Wed 12:36:45, bhell wrote: 

Show quoted text
> Hi. This is a feature request. 

> SQLite can be compiled with unicode support via the ICU library: 

> http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt 
> http://site.icu-project.org/ 

> It would be extremely helpful if DBD-SQLite were compatible with this 
> extension. I think there is only a minor step missing. Here is what I 
> tried. 

> For ICU support, you configure SQLite 3.6.20 with CFLAGS=- 
> DSQLITE_ENABLE_ICU 
> and LIBS=-licuio. I tried to see whether DBD-SQLite 1.27 works with 
> the ICU 
> extension and configured it with 

> perl Makefile.PL LIBS="-licuio" CCFLAGS="-DSQLITE_ENABLE_ICU" 
> PREFIX=/usr/local 

> Compilation works, and the resulting Perl module knows the ICU 
> collations, for 
> example, 

> SELECT icu_load_collation('de_DE','custom'); 

> does not give an error. But when you try to use this collation, say, 

> CREATE TABLE something ( column VARCHAR(20) COLLATE custom ); 

> DBD-SQLite says it does not know the collation: 

> can't install, unknown collation : custom at 
> /usr/local/lib/perl5/site_perl/5.10.0/i586-linux-thread- 
> multi/DBD/SQLite.pm 
> line 141. 

> So it seems although the ICU extension is there and the ICU collation 
> 'de_DE' 
> is known, DBD-SQLite takes control of all requested collations and 
> gives an 
> error because it does not know about the ICU collation. 

> I would be most grateful if you could support ICU in a future version. 
> I am 
> also happy to invest some time and try out patches. 

> Finally, why might somebody want ICU support? SQLite is only able to 
> sort ASCII 
> strings. Any accented characters are wrongly sorted and there is no 
> unicode or 
> locale support whatsoever (SQLite design choice because they want to 
> keep their 
> library small). Of course, every client is free to add their own 
> collations. So 
> you added support for Perl 'cmp'. In Qt, it is easy to use their 
> unicode 
> localized QString comparison, and so on. But as of today, this is done 
> differently in each client (SQLite design drawback because there is no 
> central 
> server). So your database, well, at least your indices, becomes 
> inconsistent if 
> you access the same database file in different ways, say a Perl DBD- 
> SQLite 
> script to do some data import, versus a Qt frontend to query the data 
> base. 
> Since SQLite already have the ICU extension, that one would be the 
> natural 
> standard for unicode and locale support. 

> Thanks for your attention and your help. 

> Bernhard




From: bch-4439 [...] gmx.net
Hi,<br /> <br /> thanks for the patch.<br /> <br /> I don't understand enough about the sqlite internals in order to get what the patch does, but I have tried it, and it does not do the job. Here is what happens:<br /> <br /> 1) there is no error message anymore when I use &quot;COLLATE custom&quot; in my&nbsp; SQL code<br /> <br /> 2) sorting is nevertheless wrong when I use the perl client - it simply does not follow the <br /> selected ICU collation.<br /> <br /> With (and without) the suggested patch, the unicode characters are transferred correctly from Perl DBD::SQLite into the SQLite3 database, and when I access that database with the ICU extended sqlite3 client and do some &quot;SELECT * from something ORDER BY somecolumn COLLATE custom&quot;, sorting is indeed correct. <br /> <br /> So I conclude: The patch avoids the error message, but it breaks the ICU collation.<br /> <br /> Please get back to me if you think I am making a mistake.<br /> <br /> Regards,<br /> <br /> Bernhard<br /> <br /> On Fri Jan 01 13:02:35 2010, ISHIGAKI wrote: <br /> &gt; Hi. I haven't fully investigated yet, but a quick hack to <br /> &gt; sqlite3.c seems to work for me, at least for your sample. <br /> &gt; <br /> &gt; --- sqlite3.c (revision 10704) <br /> &gt; +++ sqlite3.c (working copy) <br /> &gt; @@ -109521,7 +109521,7 @@ <br /> &gt; } <br /> &gt; assert(p); <br /> &gt; <br /> &gt; - rc = sqlite3_create_collation_v2(db, zName, SQLITE_UTF16, (void <br /> &gt; *)pUCollator, <br /> &gt; + rc = sqlite3_create_collation_v2(db, zName, SQLITE_UTF8, (void <br /> &gt; *)pUCollator, <br /> &gt; icuCollationColl, icuCollationDel <br /> &gt; ); <br /> &gt; if( rc!=SQLITE_OK ){ <br /> &gt; <br /> &gt; I don't think this is the best patch (database encoding <br /> &gt; should be taken from somewhere, and this is a patch <br /> &gt; to the amalgamated source, which should be a patch to <br /> &gt; ext/icu/icu.c), but anyway, if this works for you, or <br /> &gt; at least helps you create a better patch, please <br /> &gt; report this to the sqlite team and ask them to fix. <br /> &gt; <br /> &gt; Kenichi <br /> <br />
Hi. It turned out that the sqlite library and DBD::SQLite worked fine when we removed a too strict collation_needed check (which didn't take it into account for collations to be added by something other than DBD::SQLite's method). Fixed and added a test in the trunk. Please check it if you still have some tuits and time to spare. Thanks. Kenichi On 2009-12-30 Wed 12:36:45, bhell wrote: Show quoted text
> Hi. This is a feature request. > > SQLite can be compiled with unicode support via the ICU library: > > http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt > http://site.icu-project.org/ > > It would be extremely helpful if DBD-SQLite were compatible with this > extension. I think there is only a minor step missing. Here is what I > tried. > > For ICU support, you configure SQLite 3.6.20 with CFLAGS=- > DSQLITE_ENABLE_ICU > and LIBS=-licuio. I tried to see whether DBD-SQLite 1.27 works with > the ICU > extension and configured it with > > perl Makefile.PL LIBS="-licuio" CCFLAGS="-DSQLITE_ENABLE_ICU" > PREFIX=/usr/local > > Compilation works, and the resulting Perl module knows the ICU > collations, for > example, > > SELECT icu_load_collation('de_DE','custom'); > > does not give an error. But when you try to use this collation, say, > > CREATE TABLE something ( column VARCHAR(20) COLLATE custom ); > > DBD-SQLite says it does not know the collation: > > can't install, unknown collation : custom at > /usr/local/lib/perl5/site_perl/5.10.0/i586-linux-thread- > multi/DBD/SQLite.pm > line 141. > > So it seems although the ICU extension is there and the ICU collation > 'de_DE' > is known, DBD-SQLite takes control of all requested collations and > gives an > error because it does not know about the ICU collation. > > I would be most grateful if you could support ICU in a future version. > I am > also happy to invest some time and try out patches. > > Finally, why might somebody want ICU support? SQLite is only able to > sort ASCII > strings. Any accented characters are wrongly sorted and there is no > unicode or > locale support whatsoever (SQLite design choice because they want to > keep their > library small). Of course, every client is free to add their own > collations. So > you added support for Perl 'cmp'. In Qt, it is easy to use their > unicode > localized QString comparison, and so on. But as of today, this is done > differently in each client (SQLite design drawback because there is no > central > server). So your database, well, at least your indices, becomes > inconsistent if > you access the same database file in different ways, say a Perl DBD- > SQLite > script to do some data import, versus a Qt frontend to query the data > base. > Since SQLite already have the ICU extension, that one would be the > natural > standard for unicode and locale support. > > Thanks for your attention and your help. > > Bernhard
FYI, DBD::SQLite 1.30_03 is out. I close this ticket. If you find anything, please reopen this. Thanks. Kenichi On 2010-5-30 Sun 21:44:55, ISHIGAKI wrote: Show quoted text
> Hi. It turned out that the sqlite library and DBD::SQLite worked fine > when we removed a too strict collation_needed check (which didn't
take Show quoted text
> it into account for collations to be added by something other than > DBD::SQLite's method). Fixed and added a test in the trunk. Please > check it if you still have some tuits and time to spare. Thanks. > > Kenichi > > On 2009-12-30 Wed 12:36:45, bhell wrote:
> > Hi. This is a feature request. > > > > SQLite can be compiled with unicode support via the ICU library: > > > > http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt > > http://site.icu-project.org/ > > > > It would be extremely helpful if DBD-SQLite were compatible with
this Show quoted text
> > extension. I think there is only a minor step missing. Here is what
I Show quoted text
> > tried. > > > > For ICU support, you configure SQLite 3.6.20 with CFLAGS=- > > DSQLITE_ENABLE_ICU > > and LIBS=-licuio. I tried to see whether DBD-SQLite 1.27 works with > > the ICU > > extension and configured it with > > > > perl Makefile.PL LIBS="-licuio" CCFLAGS="-DSQLITE_ENABLE_ICU" > > PREFIX=/usr/local > > > > Compilation works, and the resulting Perl module knows the ICU > > collations, for > > example, > > > > SELECT icu_load_collation('de_DE','custom'); > > > > does not give an error. But when you try to use this collation, say, > > > > CREATE TABLE something ( column VARCHAR(20) COLLATE custom ); > > > > DBD-SQLite says it does not know the collation: > > > > can't install, unknown collation : custom at > > /usr/local/lib/perl5/site_perl/5.10.0/i586-linux-thread- > > multi/DBD/SQLite.pm > > line 141. > > > > So it seems although the ICU extension is there and the ICU
collation Show quoted text
> > 'de_DE' > > is known, DBD-SQLite takes control of all requested collations and > > gives an > > error because it does not know about the ICU collation. > > > > I would be most grateful if you could support ICU in a future
version. Show quoted text
> > I am > > also happy to invest some time and try out patches. > > > > Finally, why might somebody want ICU support? SQLite is only able to > > sort ASCII > > strings. Any accented characters are wrongly sorted and there is no > > unicode or > > locale support whatsoever (SQLite design choice because they want to > > keep their > > library small). Of course, every client is free to add their own > > collations. So > > you added support for Perl 'cmp'. In Qt, it is easy to use their > > unicode > > localized QString comparison, and so on. But as of today, this is
done Show quoted text
> > differently in each client (SQLite design drawback because there is
no Show quoted text
> > central > > server). So your database, well, at least your indices, becomes > > inconsistent if > > you access the same database file in different ways, say a Perl DBD- > > SQLite > > script to do some data import, versus a Qt frontend to query the
data Show quoted text
> > base. > > Since SQLite already have the ICU extension, that one would be the > > natural > > standard for unicode and locale support. > > > > Thanks for your attention and your help. > > > > Bernhard
> >