Bug #47371 for DBD-DB2: FW: DBD::DB2 UTF-8 Incompatibility / Bug

Fri Jun 26 12:05:56 2009 Philipp-Michael.Guehring [...] unicreditgroup.at - Ticket created

Subject:	FW: DBD::DB2 UTF-8 Incompatibility / Bug
Date:	Fri, 26 Jun 2009 18:06:21 +0200
To:	<bug-DBD-DB2 [...] rt.cpan.org>
From:	GÜHRING Philipp <Philipp-Michael.Guehring [...] unicreditgroup.at>

Hi, I discovered an incompatibility between DBD::DB2 1.71 (with DB2-Connect V9.5 on Ubuntu Linux) and DB2 v9CM on z/OS. When I SELECT a CHAR field that includes characters that are multi-byte characters in UTF-8, then DBD::DB2 only allocates the number of characters that the field has in general (SQL_DESC_DISPLAY_SIZE) as the amount of bytes, retrieves that many bytes from DB2, and cuts off the rest of the field. Example: A field has the content "Gühring" and is defined as CHAR(7). The ü is a multi-byte character in UTF-8, therefore the string is 8 Bytes long in UTF-8. DBD::DB2 allocates only 7 bytes, discards the 8th byte, and returns "Gührin" to my application, which breaks the application. There are several issues: * For querying, how much memory is needed, SQL_DESC_OCTET_LENGTH should be used instead of SQL_DESC_DISPLAY_SIZE, I guess. * Due to UTF-8 being dynamically multi-byte, the same CHAR field can have various different lengths for every row. The current code pre-allocates the fields with the field-length, which would work with single-byte codepages, but it does not work with multi-byte code-pages. If you want to continue pre-allocating the needed memory you have to allocate at least 4 bytes per character. If you want to dynamically allocate it on every row individually (like the BLOB handling), you can´t pre-allocate it for all rows. A workaround that helps a bit is to do fbh->dsize*=4; on line 1199 in the dbdimp.c , but that is not the whole solution yet. Best regards, Philipp Gühring

Mon Jun 29 23:54:01 2009 opendev [...] us.ibm.com - Status changed from 'new' to 'open'

Mon Jun 29 23:54:15 2009 opendev [...] us.ibm.com - Taken

Tue Jun 30 02:22:29 2009 opendev [...] us.ibm.com - Correspondence added

RT-Send-CC:

opendev [...] us.ibm.com

Hi Philipp Christian has opened another bug 47429 for the same issue and we would take forward for any communication on that bug report since both you and Chritian are on the CC list on that. Would make communication easier having both at the same time on that bug report page. Thus I am returning this bug. -- Thanks Tarun Pasrija IBM OpenSource Application Development Team India Software Labs, Bangalore (India)

Tue Jun 30 02:22:30 2009 opendev [...] us.ibm.com - Status changed from 'open' to 'rejected'