Skip Menu |

This queue is for tickets about the DBD-SQLite CPAN distribution.

Report information
The Basics
Id: 25371
Status: resolved
Priority: 0/
Queue: DBD-SQLite

People
Owner: Nobody in particular
Requestors: juerd [...] convolution.nl
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: Asymmetric UTF-8 support causes malformed data
Date: Sun, 11 Mar 2007 13:31:24 +0100
To: bug-DBD-SQLite [...] rt.cpan.org
From: Juerd Waalboer <juerd [...] convolution.nl>
DBD::SQLite has Unicode support, in the sense that if $dbh->{unicode} is true, it will set the UTF8 flag on (almost) all data coming from the database. This feature is incredibly useful, but only if all of your database actually does contain UTF-8 encoded strings. However, DBD::SQLite does not ensure that data going into the database really is encoded as UTF-8. Because of Perl's Unicode model, the internal encoding for strings is either UTF-8 or ISO-8859-1. This is supposed to be transparent to the user. DBD::SQLite uses whatever the internal encoding was. When later this data is pulled from the database, it gets the UTF8 flag enabled, while it might have been ISO-8859-1. The result is crashing programs because of malformed UTF-8 characters. A workaround for users of the current DBD::SQLite is to manually Encode::encode_utf8 utf8::upgrade every string sent to the database. A more permanent fix could be implemented by DBD::SQLite's author, by upgrading all strings internally, ensuring that their encoding is indeed UTF-8. This can be done in a mutating or a copying way, and no encoding takes place if the UTF8 flag was already on. They are sv_utf8_upgrade and bytes_to_utf8 respectively. See also L<perlguts/"How do I convert a string to UTF-8?"> in current bleadperl. The query itself, and all placeholder values, should get this treatment. Since UTF-8 is ASCII-compatible, this has no effect on the SQL syntax.
On Zo. maa. 11 08:34:35 2007, juerd@convolution.nl wrote: Show quoted text
> A workaround for users of the current DBD::SQLite is to manually > Encode::encode_utf8 utf8::upgrade every string sent to the database.
While encode_utf8 works now, it would break as soon as DBD::SQLite is fixed. So utf8::upgrade all data going to the database. -- Juerd
From: spamcollector_cpan [...] juerd.nl
Any news? I think it is important to fix this bug, because malformed UTF-8 data can crash programs, even at a distance. -- Juerd
Are you able to put together a test script for us that demonstrates this bug? This would make it easier to replicate.
On Sun Apr 05 15:47:05 2009, ADAMK wrote: Show quoted text
> Are you able to put together a test script for us that demonstrates this > bug? This would make it easier to replicate.
3;0 juerd@feather:~$ cat foo.t use strict; use warnings; use Test::More; use File::Temp qw(tempfile); use DBI; my @strings = ("\0", "A", "\xe9", "\x{20ac}"); plan tests => scalar @strings; my ($fh, $fn) = tempfile; my $dbh = DBI->connect("dbi:SQLite:$fn"); $dbh->{unicode} = 1; $dbh->do("CREATE TABLE foo (foo)"); for (@strings) { $dbh->do("INSERT INTO foo VALUES (?)", undef, $_); my $foo = $dbh->selectall_arrayref("SELECT foo FROM foo"); is $foo->[0][0], $_; $dbh->do("DELETE FROM foo"); } 3;0 juerd@feather:~$ perl foo.t 1..4 ok 1 ok 2 not ok 3 # Failed test at foo.t line 20. Wide character in print at /usr/share/perl5/Test/Builder.pm line 1351. # got: '�' # expected: 'é' ok 4 # Looks like you failed 1 test of 4. -- Juerd
Show quoted text
> use File::Temp qw(tempfile);
Unlinking the temporary files is left as an exercise :) -- Juerd
I've converted this to use our test shortcuts and committed it as t/rt_25371_asymmetric_unicode.t. It still fails, but now at least it's officially failing.
Resolved in 1.22