Skip Menu |

This queue is for tickets about the DBD-Pg CPAN distribution.

Report information
The Basics
Id: 103137
Status: open
Priority: 0/
Queue: DBD-Pg

People
Owner: Nobody in particular
Requestors: pushkinsv [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: inconsistencies with UTF-8 handling
Date: Fri, 27 Mar 2015 11:41:34 +0700
To: bug-DBD-Pg [...] rt.cpan.org
From: Пушкин Сергей <pushkinsv [...] gmail.com>
Hello ! there are some inconsistencies in utf8 implementation, which take place when pg_enable_utf8 attribute is not set. old utf8 implementation did not have these problems, things were broken in commit a59bf0de40 and they still are. last good commit is 2f8f77aa97, at version 3.2.1. 1) input text and parameter values for prepared statements are always required to have utf8 flag set, even when pg_enable_utf8 is not set. if statement text is utf8, but without utf8 flag set, it will be double encoded by sv_utf8_upgrade and thus broken, for example cyrillic characters 'тест' will turn into 'ÑеÑÑ' before sending to database. example: assuming database has table "test" with column "тест" with row with text value 'тест' no utf8; $dbh->{pg_enable_utf8}=0; $r=$dbh->selectall_arrayref(qq\select * from test where тест='тест'\,{Slice=>{}}); # DBD::Pg::db selectall_arrayref failed: ERROR: column "ÑеÑÑ" does not exist 2) input text for unprepared statements ("do" without bound arguments) is not altered and sent to database as is. this is not a problem, but differs in behavior from "do" with bind arguments and other ways of running prepared statements (prepare/execute/fetch, selectrow_, selectall_, etc) example: no utf8; $dbh->{pg_enable_utf8}=0; $r=$dbh->do("select *,тест from test where тест='тест'"); # will return 1 as it should be $r=$dbh->do("select *,тест from test where тест=?",undef,'тест'); # DBD::Pg::db do failed: ERROR: column "ÑеÑÑ" does not exist 3) the returned column names are always have utf8 flag set, regardless of pg_enable_utf8 state. example: no utf8; $dbh->{pg_enable_utf8}=0; $r=$dbh->selectall_arrayref(qq\select * from test\,{Slice=>{}}); use Devel::Peek; Dump $_ foreach keys $r->[0]; outputs: SV = PV(0x1861120) at 0x185fb80 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x1910da0 "\321\202\320\265\321\201\321\202" [UTF8 "\x{442}\x{435}\x{441}\x{442}"] CUR = 8 LEN = 0 I suggest attached patch dbdimp-utf8-f075d2a9be.diff, which fixes these problems. Patch should be applied to dbdimp.c at revision f075d2a9be. Also i created pull request on GitHub: https://github.com/bucardo/dbdpg/pull/17 Thanks for reading ! -- Serge Pushkin

Message body is not shown because sender requested not to inline it.

RT-Send-CC: ilmari+cpan [...] ilmari.org
Can you try this on the most recent version? We've changed a good bit of the UTF-8 related code lately. Also wondering if Dagfinn Ilmari Mannsåker has any input on these complaints?