Skip Menu |

This queue is for tickets about the DBD-mysql CPAN distribution.

Report information
The Basics
Id: 53130
Status: resolved
Priority: 0/
Queue: DBD-mysql

People
Owner: Nobody in particular
Requestors: dragon31337 [...] gmail.com
Cc: pali [...] cpan.org
AdminCc:

Bug Information
Severity: Important
Broken in: 4.011
Fixed in: 4.041_01



Subject: UTF8 strings have no utf8 flag set
I can't clearly figure out if is dupe, seems like it is not. Others were not touched for years already.
Unicode strings has no utf8 flag set. So strings are encoded in utf8, but there is no utf8 flag.
I've created a unicode db and a table in it, see dbinit.sql for mo detail.
Repro script in attach.

All strings are valid utf8 strings but without utf flag. These strings go to output incorrectly unless I update it myself.

Note: all utf8  flags set for outputs, etc...

perl version:  v5.10.1, build 1006 [291086]
DBD-mysql: 4.011
DBI: 1.609
OS: Windows 7 32bit [Version 6.1.7600]
MySQL: 5.1.37 win32 on localhost



Subject: dbinit.sql
Download dbinit.sql
application/octet-stream 868b

Message body not shown because it is not plain text.

Subject: repro.pl
use Data::Dumper; use strict; use DBI; use utf8; use Encode; binmode STDOUT, ":utf8"; binmode STDIN, ":utf8"; my ($host, $port, $database, $user, $password, $rise) = ('localhost','3306','flights','root','itsme'); my $dsn = "DBI:mysql:host=$host;port=$port;". "database=$database;". "mysql_compression=1;". "mysql_client_found_rows=1;". "mysql_auto_reconnect=1;". "mysql_enable_utf8=1;"; my $dbh = DBI->connect( $dsn, $user, $password, { RaiseError => 1, } ); $dbh->do("SET character_set_client = utf8;"); $dbh->do("SET character_set_connection = utf8;"); $dbh->do("SET character_set_results = utf8;"); my $query = $dbh->prepare("SELECT * FROM countries"); $query->execute(); my $data = $query->fetchall_arrayref(); foreach my $str (@$data) { foreach my $val (@$str) { print ("Value: $val, it is ".(utf8::is_utf8($val)?"":"non-")."utf8 string, and ".(utf8::valid($str)?"":"not")." valid<br>\n"); Encode::_utf8_on($val); print ("Value: $val, it is ".(utf8::is_utf8($val)?"":"non-")."utf8 string, and ".(utf8::valid($str)?"":"not")." valid<br>\n"); } } $query->finish(); $dbh->disconnect(); return $data;
From: peter [...] vereshagin.org
I have another kind of trouble here. WHat do you think about the patch I supply here? I was trying your repro.pl to know out if that is your trouble, too: http://lists.mysql.com/perl/4382 But it looks like not.
Subject: repro-utf8-2011-01-27.diff
--- repro.pl.old 2011-01-27 17:31:29.000000000 +0300 +++ repro.pl 2011-01-27 17:33:19.000000000 +0300 @@ -1,9 +1,7 @@ use Data::Dumper; use strict; use DBI; -use utf8; use Encode; -binmode STDOUT, ":utf8"; binmode STDIN, ":utf8"; @@ -33,11 +31,10 @@ foreach my $val (@$str) { print ("Value: $val, it is ".(utf8::is_utf8($val)?"":"non-")."utf8 string, and ".(utf8::valid($str)?"":"not")." valid<br>\n"); - Encode::_utf8_on($val); + utf8::upgrade( $val ); print ("Value: $val, it is ".(utf8::is_utf8($val)?"":"non-")."utf8 string, and ".(utf8::valid($str)?"":"not")." valid<br>\n"); } } $query->finish(); $dbh->disconnect(); - return $data;
Subject: MySQL driver does not handle UTF strings properly
From: starrychloe [...] oliveyou.net
Yes this is a bug. Mistakenly reported it to MSSQL before. It does not get the same values it writes. Here is test case: use DBI; use Data::Dumper; use strict; use utf8; binmode STDOUT, ":utf8"; my $h = DBI->connect('dbi:mysql:database=xxx;host=server99', 'xxx', 'xxxxx') or die("Cannot connect to MySQL database: ", $DBI::errstr); $h->do('SET NAMES utf8'); eval { $h->do(q/drop table mje/); }; $h->do(q/create table mje (a nvarchar(20))/); my $unicode = "\x{e9} é \x{20ac}"; print $unicode, ', ', utf8::is_utf8($unicode), ', ', Dumper($unicode); $h->do(q/insert into mje values(?)/, undef, $unicode); my $s = $h->prepare(q/select * from mje/); $s->execute; my $f = $s->fetchall_arrayref; my $x = $f->[0]->[0]; # utf8::decode($x); print $x, ', ', utf8::is_utf8($x), ', ', (map { sprintf('%02X ', ord($_)) } split (//, $x)), ', ', Dumper($f), "\n"; exit; ----------------------- This is the output ├⌐ ├⌐ Γé¼, 1, $VAR1 = "\x{e9} \x{e9} \x{20ac}"; ├â┬⌐ ├â┬⌐ ├ó┬é┬¼, , C3 A9 20 C3 A9 20 E2 82 AC , $VAR1 = [ [ '├â┬⌐ ├â┬⌐ ├ó┬é┬¼' ] ]; ------------------------ You must forgive the line drawing characters because it was using ActiveState Perl on Windows, which cannot print even hardcoded Unicode to the console, nor can Strawbery Perl. Only Cygwin Perl seems to display Unicode properly (but I cannot get DBD::MySQL to compile with Cygwin). The important point to notice is that '1' value, indicating is_utf(), and the hexadecimal representation of é, which is E9 in Unicode, and C3 A9 in UTF8. The first Dumper output is correct. On the 2nd line, notice is_utf() does not return a value of '1', and that the hexadecimal value of the string is broken down into a UTF8 byte stream. It was not decoded properly. If I use utf8::decode() on the value returned from the database, then it works ok. Without the UTF8 decoding, Perl is assuming the multiple bytes C3 A9 which represents é and should be combined into Unicode E9 are separate Unicode characters C3 and another character A9, which is completely not what was expected. Here is additional information -------------------------- (Terminal is set to UTF8 character set.) Show quoted text
mysql> select a,hex(a) from mje;
+-----------+--------------------+ | a | hex(a) | +-----------+--------------------+ | é é € | C3A920C3A920E282AC | +-----------+--------------------+ Show quoted text
mysql> status
-------------- mysql Ver 14.12 Distrib 5.0.60, for pc-linux-gnu (i686) using readline 5.2 Server version: 5.0.60-log Gentoo Linux mysql-5.0.60-r1 Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 List of Unicode characters and their UTF8 hex values http://www.utf8-chartable.de/
From: starrychloe [...] oliveyou.net
Additional information: C:\Users\xxxxxxx\Documents\xxxx-serverscripts>perl -MDBI -e "DBI- Show quoted text
>installed_versions"
Perl : 5.010001 (MSWin32-x64-multi-thread) OS : MSWin32 (5.2) DBI : 1.615 DBD::mysql : 4.018 Similar but misfiled bug: https://rt.cpan.org/Public/Bug/Display.html?id=69362
Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67 I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...
Reopening, fix was reverted in 4.043.