Subject: | inconsistencies with UTF-8 handling |
Date: | Fri, 27 Mar 2015 11:41:34 +0700 |
To: | bug-DBD-Pg [...] rt.cpan.org |
From: | Пушкин Сергей <pushkinsv [...] gmail.com> |
Hello !
there are some inconsistencies in utf8 implementation, which take place
when pg_enable_utf8 attribute is not set.
old utf8 implementation did not have these problems, things were broken
in commit a59bf0de40 and they still are.
last good commit is 2f8f77aa97, at version 3.2.1.
1) input text and parameter values for prepared statements are always
required to have utf8 flag set, even when pg_enable_utf8 is not set.
if statement text is utf8, but without utf8 flag set, it will be
double encoded by sv_utf8_upgrade and thus broken, for example cyrillic
characters 'тест' will turn into 'ÑеÑÑ' before sending to database.
example:
assuming database has table "test" with column "тест" with row
with text value 'тест'
no utf8;
$dbh->{pg_enable_utf8}=0;
$r=$dbh->selectall_arrayref(qq\select * from test where
тест='тест'\,{Slice=>{}});
# DBD::Pg::db selectall_arrayref failed: ERROR: column "ÑеÑÑ" does not
exist
2) input text for unprepared statements ("do" without bound arguments)
is not altered and sent to database as is.
this is not a problem, but differs in behavior from "do" with bind
arguments and other ways of running prepared statements
(prepare/execute/fetch,
selectrow_, selectall_, etc)
example:
no utf8;
$dbh->{pg_enable_utf8}=0;
$r=$dbh->do("select *,тест from test where тест='тест'");
# will return 1 as it should be
$r=$dbh->do("select *,тест from test where тест=?",undef,'тест');
# DBD::Pg::db do failed: ERROR: column "ÑеÑÑ" does not exist
3) the returned column names are always have utf8 flag set, regardless of
pg_enable_utf8 state.
example:
no utf8;
$dbh->{pg_enable_utf8}=0;
$r=$dbh->selectall_arrayref(qq\select * from test\,{Slice=>{}});
use Devel::Peek;
Dump $_ foreach keys $r->[0];
outputs:
SV = PV(0x1861120) at 0x185fb80
REFCNT = 2
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x1910da0 "\321\202\320\265\321\201\321\202" [UTF8
"\x{442}\x{435}\x{441}\x{442}"]
CUR = 8
LEN = 0
I suggest attached patch dbdimp-utf8-f075d2a9be.diff, which fixes these
problems.
Patch should be applied to dbdimp.c at revision f075d2a9be.
Also i created pull request on GitHub:
https://github.com/bucardo/dbdpg/pull/17
Thanks for reading !
--
Serge Pushkin
Message body is not shown because sender requested not to inline it.