Skip Menu |

This queue is for tickets about the DBD-mysql CPAN distribution.

Report information
The Basics
Id: 25590
Status: resolved
Worked: 30 min
Priority: 0/
Queue: DBD-mysql

People
Owner: CAPTTOFU [...] cpan.org
Requestors: JRED [...] cpan.org
Cc: pali [...] cpan.org
AdminCc:

Bug Information
Severity: Important
Broken in: 4.003
Fixed in: 4.041_01



Subject: mysql_enable_utf8 misses utf8 upgrading of non utf8 strings
When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8 strings. Instead it assumes that all data passed to DBD::mysql is valid utf8 already. Non ASCII characters are simply removed from strings passed to DBD::mysql. Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release) which adds a correspondent regression test. With this patch test #19 fails with the following message: not ok 19 at line 185 got back 'umlauts: ' instead of 'umlauts: äüö'. I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27 client libraries.
Subject: DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt
--- t/utf8.t 2007-03-21 11:59:04.954355876 +0100 +++ t/utf8.t.patched 2007-03-21 11:58:42.653080899 +0100 @@ -102,7 +102,7 @@ # Create a new table; In an ideal world, it'd be more sensible to # make the whole database UTF8... # - $query = "CREATE TABLE $table (name VARCHAR(64) CHARACTER SET utf8, bincol BLOB, shape GEOMETRY)"; + $query = "CREATE TABLE $table (name VARCHAR(64) CHARACTER SET utf8, bincol BLOB, shape GEOMETRY, test INTEGER)"; Test($state or $dbh->do($query)) or ErrMsgF("Cannot create table: Error %s.\n", $dbh->errstr); @@ -113,24 +113,34 @@ my $utf8_str = "\x{0100}dam"; # "Adam" with a macron. my $quoted_utf8_str = "'\x{0100}dam'"; - + my $blob = "\x{c4}\x{80}dam"; # same as utf8_str but not utf8 encoded my $quoted_blob = "'\x{c4}\x{80}dam'"; + my $umlaut_str = "umlauts: äüö"; + my $quoted_umlaut_str = "'umlauts: äüö'"; + Test( $state or ( $dbh->quote( $utf8_str ) eq $quoted_utf8_str ) ) or ErrMsg( "Failed to retain UTF-8 flag when quoting.\n" ); Test( $state or ( $dbh->quote( $blob ) eq $quoted_blob ) ) or ErrMsg( "UTF-8 flag was set when quoting.\n" ); + Test( $state or ( $dbh->quote( $umlaut_str ) eq $quoted_umlaut_str ) ) + or ErrMsg( "Failed to retain UTF-8 flag when quoting non-utf8 string.\n" ); + Test( $state or ( $dbh->{ mysql_enable_utf8 } ) ) or ErrMsg( "mysql_enable_utf8 didn't survive connect()\n" ); - $query = qq{INSERT INTO $table (name, bincol, shape) VALUES (?,?, GeomFromText('Point(132865 501937)'))}; + $query = qq{INSERT INTO $table (name, bincol, shape, test) VALUES (?,?, GeomFromText('Point(132865 501937)'),1)}; Test( $state or $dbh->do( $query, {}, $utf8_str,$blob ) ) or ErrMsgF( "INSERT failed: query $query, error %s.\n", $dbh->errstr ); - $query = "SELECT name,bincol,asbinary(shape) FROM $table LIMIT 1"; + $query = qq{INSERT INTO $table (name,test) values (?,2)}; + Test( $state or $dbh->do( $query, {}, $umlaut_str ) ) + or ErrMsgF( "INSERT failed: query $query, error %s.\n", $dbh->errstr ); + + $query = "SELECT name,bincol,asbinary(shape) FROM $table WHERE test=1 LIMIT 1"; Test( $state or ($sth = $dbh->prepare( $query ) ) ) or ErrMsgF( "prepare failed: query $query, error %s.\n", $dbh->errstr ); @@ -158,6 +168,22 @@ Test( $state or ($ref->[1] eq $blob) ) or ErrMsgF( "got back '$ref->[1]' instead of '$blob'.\n" ); + Test( $state or $sth->finish ) + or ErrMsgF( "Cannot finish: %s.\n", $sth->errstr ); + + # test if non-utf8 tagged data was inserted correctly + $query = "SELECT name FROM $table WHERE test=2 LIMIT 1"; + Test( $state or ($sth = $dbh->prepare( $query ) ) ) + or ErrMsgF( "prepare failed: query $query, error %s.\n", $dbh->errstr ); + + Test($state or $sth->execute) + or ErrMsgF( "execute failed: query $query, error %s.\n", $dbh->errstr ); + + Test( $state or defined( $ref = $sth->fetchrow_arrayref ) ) + or ErrMsgF( "fetch failed: query $query, error %s.\n", $sth->errstr ); + + Test( $state or ($ref->[0] eq $umlaut_str) ) + or ErrMsgF( "got back '$ref->[0]' instead of '$umlaut_str'.\n" ); Test( $state or $sth->finish ) or ErrMsgF( "Cannot finish: %s.\n", $sth->errstr );
Joern, Thank you very much for this patch. One issue though, I had applied another patch to this test from Joost Diepenmaat that might address what your patch intends to address. Could you please check out the latest DBD::mysql and verify if you still need to make a patch against that test? If you would like, I can even give you commit access to the repository. I'm hoping to do a release either later today or tomorrow, but will wait upon your response before I do. Kind regards, Patrick On Wed Mar 21 07:17:54 2007, JRED wrote: Show quoted text
> > When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8 > strings. Instead it assumes that all data passed to DBD::mysql is valid > utf8 already. Non ASCII characters are simply removed from strings > passed to DBD::mysql. > > Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release) > which adds a correspondent regression test. > > With this patch test #19 fails with the following message: > > not ok 19 at line 185 > got back 'umlauts: ' instead of 'umlauts: äüö'. > > I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27 > client libraries.
On Mi. 21. Mär. 2007, 08:42:14, CAPTTOFU wrote: Show quoted text
> Thank you very much for this patch. One issue though, I had applied > another patch to this test from Joost Diepenmaat that might address what > your patch intends to address. Could you please check out the latest > DBD::mysql and verify if you still need to make a patch against that > test? If you would like, I can even give you commit access to the > repository.
Thanks for the quick reply. I just checked out svn trunk and my patched utf8.t still fails. As far as I understand the Changes file correctly, Joost's patch addresses a missing utf8 flag when selecting utf8_bin columns from the database. My issue is that DBD::mysql passes all data as-is to the database even when the connection is in utf8 mode. This way all non ASCII characters of non-utf8-tagged strings gets lost in the database. But passing non-utf8-tagged strings to DBD::mysql should be absolutely valid, since they're valid for Perl they should be valid for DBD::mysql as well ;) A fix would be to utf8::upgrade all binded parameters and the SQL statement itself before sending them to the server when mysql_enable_utf8 is active. Regards, Jörn
From: JRED [...] cpan.org
On Mi. 21. Mär. 2007, 16:43:05, JRED wrote: Show quoted text
> A fix would be to utf8::upgrade all binded parameters and the SQL > statement itself before sending them to the server when > mysql_enable_utf8 is active.
More exactly: of course not all strings need to be upgraded, just those targeted to character columns. In particular blobs wouldn't survive an utf8::upgrade ;) If you're interested I can provide more test scripts for these issues. Regards, Jörn
From: CAPTTOFU [...] cpan.org
Hi, This is on my radar, but I admit, I'm not a guru on UTF8 - probably an American issue. I have recently started to have to deal with UTF8 a bit on grazr, particularly with Chinese feeds, and now understand some of the issues with UTF8. What does utf8::upgrade do exactly? I would like to fix this issue! On Thu Mar 22 14:11:50 2007, JRED wrote: Show quoted text
> On Mi. 21. Mär. 2007, 16:43:05, JRED wrote: >
> > A fix would be to utf8::upgrade all binded parameters and the SQL > > statement itself before sending them to the server when > > mysql_enable_utf8 is active.
> > More exactly: of course not all strings need to be upgraded, just those > targeted to character columns. In particular blobs wouldn't survive an > utf8::upgrade ;) > > If you're interested I can provide more test scripts for these issues. > > Regards, > > Jörn
From: JRED [...] cpan.org
On Sat May 26 21:56:33 2007, CAPTTOFU wrote: Show quoted text
> What does utf8::upgrade do exactly?
When the variable has the utf-8 flag set, it does nothing, because it assumes that this variables carries valid utf-8 already. If the utf-8 flag isn't set, the variable is _converted_ to utf-8 and the utf-8 flag is set as well. The latter case is interesting in conjunction with DBD::mysql when the database connection is utf-8, but the application passes non utf-8 data. Then the data has to be "upgraded" to utf-8 and must not be passed as-is, because MySQL then receives illegal data which does not conform to utf-8. So all parameters binded to SQL statements (and the SQL statements themself) need to go through a utf8::upgrade() and everything will work as expected. We do this in a database layer we wrote for our applications, but we want to get rid of this layer ;) Of course it would be better if this is handled on DBI/DBD level, not on application level. I dunno the DBI/DBD architecture well, but probably this could be done even at DBI level, so all drivers would benefit. Regards, Jörn
From: JDIEPEN [...] cpan.org
Sorry for jumping in, but... On Sun May 27 05:24:09 2007, JRED wrote: Show quoted text
> So all parameters binded to SQL statements (and the SQL statements > themself) need to go through a utf8::upgrade() and everything will work > as expected.
Note that any bounded binary columns/values must NOT be upgraded. This means you need to know for each bounded parameter if it's a text value or binary. I've looked at this issue myself, but I couldn't figure out how to do it. My experience with dbd::mysql & libmysqlclient is pretty limited though. Also note that it's NOT enough to check the sql type of the column. See also RT #24738 (fixed that for selecting binary & utf-8 values)
Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67 I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...
Reopening, fix was reverted in 4.043.