Subject: | Asymmetric UTF-8 support causes malformed data |
Date: | Sun, 11 Mar 2007 13:31:24 +0100 |
To: | bug-DBD-SQLite [...] rt.cpan.org |
From: | Juerd Waalboer <juerd [...] convolution.nl> |
DBD::SQLite has Unicode support, in the sense that if $dbh->{unicode} is
true, it will set the UTF8 flag on (almost) all data coming from the
database. This feature is incredibly useful, but only if all of your
database actually does contain UTF-8 encoded strings.
However, DBD::SQLite does not ensure that data going into the database
really is encoded as UTF-8. Because of Perl's Unicode model, the
internal encoding for strings is either UTF-8 or ISO-8859-1. This is
supposed to be transparent to the user. DBD::SQLite uses whatever the
internal encoding was. When later this data is pulled from the database,
it gets the UTF8 flag enabled, while it might have been ISO-8859-1.
The result is crashing programs because of malformed UTF-8 characters.
A workaround for users of the current DBD::SQLite is to manually
Encode::encode_utf8 utf8::upgrade every string sent to the database.
A more permanent fix could be implemented by DBD::SQLite's author, by
upgrading all strings internally, ensuring that their encoding is indeed
UTF-8. This can be done in a mutating or a copying way, and no encoding
takes place if the UTF8 flag was already on. They are sv_utf8_upgrade
and bytes_to_utf8 respectively. See also L<perlguts/"How do I convert a
string to UTF-8?"> in current bleadperl.
The query itself, and all placeholder values, should get this treatment.
Since UTF-8 is ASCII-compatible, this has no effect on the SQL syntax.