Skip Menu |

This queue is for tickets about the DBD-Pg CPAN distribution.

Report information
The Basics
Id: 91655
Status: resolved
Priority: 0/
Queue: DBD-Pg

People
Owner: Nobody in particular
Requestors: OSCHWALD [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 3.3.0



Subject: UTF-8 support appears broken in git master
I recently ran into #40199 and have been testing the version from source control to see if it would work for us. However, it appears that the UTF-8 support is now broken for all data types. In a round-trip test, the strings come back with the UTF-8 flag on, but they appear to have been doubly encoded. For instance, "\u263a" comes back as "\u00e2\u0098\u00ba" (with the UTF-8 flag on). The character is correctly encoded in Pg. The client_encoding is set to UTF-8. Previously, this sort of test worked fine previously for text columns, but not citext columns.
It appears this worked in 2.20.1_4. Reverting 972e25c fixed the issue.
Here is a test case for the issue: my $before = "\N{WHITE SMILING FACE}"; my ($after) = $dbh->selectrow_array('SELECT ?::text', {}, $before); is($after, $before, 'string is the same after round trip'); ok(utf8::is_utf8($after), 'string has utf8 flag set');
Thanks: this is a known issue under active development :)
I wasn't sure as the unicode tests in the repo passed and the breakage seemed to be recent.
On Thu Dec 26 14:29:13 2013, OSCHWALD wrote: Show quoted text
> I wasn't sure as the unicode tests in the repo passed and the breakage > seemed to be recent.
Ha! Fair enough. Thanks for the report. Hopefully everything will work soon. I'll make sure to integrate your now-failing tests in as well.
Added the tests here to t/30-unicode.t as of 7da29e33e7e1950202516b43c2fd8009ff7b5637
See if git head looks a little saner now please
Going to label this as patched for now.
Subject: Re: [rt.cpan.org #91655] UTF-8 support appears broken in git master
Date: Wed, 22 Jan 2014 06:44:06 -0800
To: bug-DBD-Pg [...] rt.cpan.org
From: Gregory Oschwald <oschwald [...] gmail.com>
Thanks! It is in my queue of things to do, but I haven't gotten around to it. I'll let you know if it doesn't work. Greg
The UTF-8 handling on input is still broken in 3.0.0. Specifically, the UTF-8 flag is not checked on input, so if client_encoding=UTF8 and the string contains codepoints >128 and happens to be downgraded (UTF8 flag off, codepoints stored as bytes), Postgres will reject it. ilmari@zarquon:~/src/DBD-Pg$ perl -Mblib -MDBD::Pg -E 'say DBD::Pg->VERSION' 3.0.0 ilmari@zarquon:~/src/DBD-Pg$ git diff diff --git a/t/30unicode.t b/t/30unicode.t index 7c4da06..674bca6 100644 --- a/t/30unicode.t +++ b/t/30unicode.t @@ -28,6 +28,7 @@ my $pgversion = $dbh->{pg_server_version}; my $t; my $name = "\N{LATIN CAPITAL LETTER E WITH ACUTE}milie du Ch\N{LATIN SMALL LETTER A WITH CIRCUMFLEX}telet"; +utf8::downgrade($name); my $SQL = 'SELECT ?::text'; my $sth = $dbh->prepare($SQL); ilmari@zarquon:~/src/DBD-Pg$ prove -bv t/30unicode.t t/30unicode.t .. ok 1 - Connect to database for unicode testing DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xc9 0x6d at t/30unicode.t line 35. Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle at t/30unicode.t line 35. # Tests were run but no plan was declared and done_testing() was not seen. # Looks like your test exited with 255 just after 1. Dubious, test returned 255 (wstat 65280, 0xff00) All 1 subtests passed Test Summary Report ------------------- t/30unicode.t (Wstat: 65280 Tests: 1 Failed: 0) Non-zero exit status: 255 Parse errors: No plan found in TAP output Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.09 cusr 0.01 csys = 0.13 CPU) Result: FAIL
Show quoted text
> and happens to be downgraded (UTF8 flag off, codepoints stored as bytes), > Postgres will reject it.
I'm not convinced this is the driver's responsibility.
On 2014-02-04 22:02:14, TURNSTEP wrote: Show quoted text
> > and happens to be downgraded (UTF8 flag off, codepoints stored as bytes), > > Postgres will reject it.
> > I'm not convinced this is the driver's responsibility.
The UTF8 flag does not affect the perl semantics of the string, so the user should never have to care about it. It only matters to XS code looking at the actual bytes in the PV buffer, so if we're doing UTF8 at all (rather than expecting the user to do all the de/en-coding themselves, which defeats the purpose of the whole pg_enable_utf8 and client_encoding handling), we should deal with it correctly.
On Wed Feb 05 10:39:56 2014, ilmari wrote: ... Any chance you can polish up those github patches and submit them? I'd be happy to roll them into the next version.
Show quoted text
> Any chance you can polish up those github patches and submit them? I'd > be happy to roll them into the next version.
Patches have been pushed, for those playing at home.