Skip Menu |

This queue is for tickets about the DBD-ODBC CPAN distribution.

Report information
The Basics
Id: 69362
Status: rejected
Priority: 0/
Queue: DBD-ODBC

People
Owner: Nobody in particular
Requestors: starrychloe [...] oliveyou.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.31
Fixed in: (no value)



Subject: Does not prepend N to Unicode strings when preparing statements for MSSQL.
Here is some code and the output my $db1 = DBI->connect('dbi:ODBC:SFSDB', 'xxx', 'xxx'); ... print "ODBC has unicode=",$db1->{'odbc_has_unicode'},"\n"; ... map {printf ("$_, %d\n", utf8::is_utf8($_))} @values; print "Values: ", Dumper(@values); $sthUpdate->trace(2); $sthUpdate->execute(@values); ---------- ODBC has unicode=1 7, 1 1411, 1 ... Buster Caf├⌐, 1 ... Values: $VAR1 = '7'; $VAR2 = '1411'; ... $VAR5 = "Buster Caf\x{c3}\x{a9}"; ... Binding parameters: update merchants set merchantStatus_id='7', salesRep_id='1411', SICCodeID='xxxxx', MerchantName='xxxxxx', MerchantDBA='Buster Caf├â┬⌐', TaxID='xxxxx', Website=NULL, MinimumBridgeAccountBalance='xxxxx', ........ where id='35' ------------------------ Resulting value in DB as displayed in MSSQL Studio: Buster Café ------------------------ The column is definitely nvarchar. Windows Server 2008 R2 ActiveState Perl 5.10.1 x64 DBD::ODBC : 1.31 DBI : 1.615 MSSQL Server 2008 Notice ODBC has Unicode. Notice is_utf8() returns 1 for each value. It does not display correctly in the console because of ActiveState. ActiveState Perl won't even display a hard coded UTF character properly. Only Cygwin Perl will display Unicode correctly in console. Notice the hex values are correct for accented e (é). Notice the generated SQL does not prepend a N to the strings. This is a general purpose synchronization script, so it is unknown beforehand if the field is text or number. It relies on prepared statements and parameters. This also fails in Strawberry Perl. Unknown if Cygwin Perl succeeds, as the script also requires MySQL driver which could not be compiled under Cygwin.
On Fri Jul 08 09:09:21 2011, starrychloe wrote: Show quoted text
> Here is some code and the output > > my $db1 = DBI->connect('dbi:ODBC:SFSDB', 'xxx', 'xxx'); > ... > print "ODBC has unicode=",$db1->{'odbc_has_unicode'},"\n"; > ... > map {printf ("$_, %d\n", utf8::is_utf8($_))} @values; > print "Values: ", Dumper(@values); > $sthUpdate->trace(2); > $sthUpdate->execute(@values); > > ---------- > > ODBC has unicode=1 > 7, 1 > 1411, 1 > ... > Buster Caf├⌐, 1 > ... > Values: $VAR1 = '7'; > $VAR2 = '1411'; > ... > $VAR5 = "Buster Caf\x{c3}\x{a9}"; > ... > Binding parameters: update merchants set merchantStatus_id='7', > salesRep_id='1411', SICCodeID='xxxxx', MerchantName='xxxxxx', > MerchantDBA='Buster Caf├â┬⌐', TaxID='xxxxx', Website=NULL, > MinimumBridgeAccountBalance='xxxxx', ........ where id='35' > > ------------------------ > Resulting value in DB as displayed in MSSQL Studio: Buster Café > ------------------------ > > The column is definitely nvarchar. > Windows Server 2008 R2 > ActiveState Perl 5.10.1 x64 > DBD::ODBC : 1.31 > DBI : 1.615 > MSSQL Server 2008 > Notice ODBC has Unicode. > Notice is_utf8() returns 1 for each value. > It does not display correctly in the console because of ActiveState. > ActiveState Perl won't even display a hard coded UTF character
properly. Show quoted text
> Only Cygwin Perl will display Unicode correctly in console. > Notice the hex values are correct for accented e (é). > Notice the generated SQL does not prepend a N to the strings. > This is a general purpose synchronization script, so it is unknown > beforehand if the field is text or number. > It relies on prepared statements and parameters. > This also fails in Strawberry Perl. > Unknown if Cygwin Perl succeeds, as the script also requires MySQL > driver which could not be compiled under Cygwin.
I'm confused by this report and especially with the subject as I don't see why DBD::ODBC should prepend N to anything. In any case you are binding parameters and your SQL itself is not unicode. How anything is displayed in a console is nothing to do with DBD::ODBC either. The output you show of $VAR5 = "Buster Caf\x{c3}\x{a9}"; does not look right to me either as it should be a single character not two. Perhaps you can try: use DBI; use strict; use Data::Dumper; my $h = DBI->connect('dbi:ODBC:xxx','xxx','xxx'); eval { $h->do(q/drop table mje/); }; $h->do(q/create table mje (a nvarchar(20))/); $h->do(q/insert into mje values(?)/, undef, "\x{20ac}"); my $s = $h->prepare(q/select * from mje/); $s->execute; my $f = $s->fetchall_arrayref; print Dumper($f), "\n"; which outputs: $VAR1 = [ [ '€' ] ]; for me with a working unicode ODBC driver. I cannot tell from your example whether your data is correct or not. Perhaps you could write a small example like mine above which demonstrates a problem instead of partial snippets from your code. For instance, put your unicode strings in the Perl code itself so I can see them. Martin -- Martin J. Evans Wetherby, UK
Subject: RE: [rt.cpan.org #69362] Does not prepend N to Unicode strings when preparing statements for MSSQL. (verification)
Date: Fri, 8 Jul 2011 09:15:30 -0500 (CDT)
To: bug-dbd-odbc [...] rt.cpan.org
From: Chloe <starrychloe [...] spamarrest.com>
Chloe here, I'm protecting myself from receiving junk mail. Please click the link below to complete the verification process. You have to do this only once. http://www.spamarrest.com/a2?AQR4AwV5BGcvqJpgMTWxYJ9xLzANpaDhL3Ouov5ipzpj Spam Arrest - Take control of your inbox! ------------------------------------------------------------ You are receiving this message in response to your email to Chloe, a Spam Arrest customer. Spam Arrest requests that senders verify themselves before their email is delivered. When you click the above link, you will be taken to a page with a graphic on it. Simply read the word in the graphic, type it into the form, and you're verified. You have to do this only once per Spam Arrest customer. ------------------------------------------------------------ Below are the complete headers of the message that this email was generated in response to. Return-path: <rtcpan@cpan.rt.develooper.com> Envelope-to: starrychloe@oliveyou.net Delivery-date: Fri, 08 Jul 2011 07:15:14 -0700 Received: from cpan.rt.develooper.com ([207.171.7.181]) by phobos.lunarpages.com with esmtp (Exim 4.69) (envelope-from <rtcpan@cpan.rt.develooper.com>) id 1QfBpq-0000X7-E2 for starrychloe@oliveyou.net; Fri, 08 Jul 2011 07:15:14 -0700 Received: by cpan.rt.develooper.com (Postfix, from userid 536) id 35BF37057; Fri, 8 Jul 2011 07:15:12 -0700 (PDT) Precedence: normal Subject: [rt.cpan.org #69362] Does not prepend N to Unicode strings when preparing statements for MSSQL. From: "Martin J Evans via RT" <bug-DBD-ODBC@rt.cpan.org> Reply-To: bug-DBD-ODBC@rt.cpan.org In-Reply-To: References: <RT-Ticket-69362@rt.cpan.org> Message-ID: <rt-3.8.HEAD-9061-1310134511-16.69362-6-0@rt.cpan.org> X-RT-Loop-Prevention: rt.cpan.org RT-Ticket: rt.cpan.org #69362 Managed-by: RT 3.8.HEAD (http://www.bestpractical.com/rt/) RT-Originator: bohica@ntlworld.com To: starrychloe@oliveyou.net MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-RT-Original-Encoding: utf-8 Date: Fri, 8 Jul 2011 10:15:12 -0400 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 X-Spam-Score: -18 X-Spam-Bar: - X-Ham-Report: Spam detection software, running on the system "phobos.lunarpages.com", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: <URL: https://rt.cpan.org/Ticket/Display.html?id=69362 > On Fri Jul 08 09:09:21 2011, starrychloe wrote: > Here is some code and the output > > my $db1 = DBI->connect('dbi:ODBC:SFSDB', 'xxx', 'xxx'); > ... > print "ODBC has unicode=",$db1->{'odbc_has_unicode'},"\n"; > ... > map {printf ("$_, %d\n", utf8::is_utf8($_))} @values; > print "Values: ", Dumper(@values); Show quoted text
> $sthUpdate->trace(2); > $sthUpdate->execute(@values); > > > > ODBC has
unicode=1 > 7, 1 > 1411, 1 > ... > Buster Café, 1 > ... > Values: $VAR1 = '7'; > $VAR2 = '1411'; > ... > $VAR5 = "Buster Caf\x{c3}\x{a9}"; > ... Show quoted text
> Binding parameters: update merchants set merchantStatus_id='7', > salesRep_id='1411',
SICCodeID='xxxxx', MerchantName='xxxxxx', > MerchantDBA='Buster Caf├â┬⌐', TaxID='xxxxx', Website=NULL, > MinimumBridgeAccountBalance='xxxxx', ........ where id='35' > > > Resulting value in DB as displayed in MSSQL Studio: Buster Café > > > The column is definitely nvarchar. > Windows Server 2008 R2 Show quoted text
> ActiveState Perl 5.10.1 x64 > DBD::ODBC : 1.31 > DBI : 1.615 > MSSQL Server
2008 > Notice ODBC has Unicode. > Notice is_utf8() returns 1 for each value. Show quoted text
> It does not display correctly in the console because of ActiveState. >
ActiveState Perl won't even display a hard coded UTF character properly. > Only Cygwin Perl will display Unicode correctly in console. > Notice the hex values are correct for accented e (é). > Notice the generated SQL does not prepend a N to the strings. > This is a general purpose synchronization script, so it is unknown > beforehand if the field is text or number. > It relies on prepared statements and parameters. > This also fails in Strawberry Perl. > Unknown if Cygwin Perl succeeds, as the script also requires MySQL Show quoted text
> driver which could not be compiled under Cygwin. [...]
Content analysis details: (-1.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Flag: NO Received-SPF: none(rt.cpan.org: rt.cpan.org does not designate permitted sender hosts) ------------------------------------------------------------ Webmasters help stop spam and make 50%. http://www.spamarrest.com/affl?4186299/affiliates/index.jsp ------------------------------------------------------------

Message body is not shown because it is too large.

From: starrychloe [...] oliveyou.net
Unicode é is C3 A9 in bytes http://www.utf8-chartable.de/ MSSQL doesn't support Unicode connections, so must specify N'string' to signify Unicode: http://support.microsoft.com/kb/239530 http://stackoverflow.com/questions/1379416/insert-utf8-data-into-a-ms-sql- server-2008 Will try your example...
On Fri Jul 08 11:09:09 2011, starrychloe wrote: Show quoted text
> Unicode é is C3 A9 in bytes > http://www.utf8-chartable.de/ > > MSSQL doesn't support Unicode connections, so must specify N'string'
to Show quoted text
sql- Show quoted text
> server-2008 > > Will try your example...
Firstly there was a small error in the output I gave above. It should have been: $VAR1 = [ [ "\x{20ac}" ] ]; Secondly, don't believe everything you read on the net as fact and don't take answers for C# as being the right answers for Perl and ODBC. The ODBC specification supports unicode encoded in UCS2 as does MS SQL Server and the MS SQL Server driver. The code I posted works and you don't need any N prefix. I will drag out some more examples for you but it is going to make no difference. If you bind your parameters as in "insert into mytable values(?)" you only need to ensure they are bound as SQL_WCHARs and DBD::ODBC will do that for you. Avoid unicode in SQL text but that does mostly work too. Martin -- Martin J. Evans Wetherby, UK
On Fri Jul 08 12:21:34 2011, MJEVANS wrote: Show quoted text
> On Fri Jul 08 11:09:09 2011, starrychloe wrote:
> > Unicode é is C3 A9 in bytes > > http://www.utf8-chartable.de/ > > > > MSSQL doesn't support Unicode connections, so must specify N'string'
> to
ms- Show quoted text
> sql-
> > server-2008 > > > > Will try your example...
> > Firstly there was a small error in the output I gave above. It should > have been: > > $VAR1 = [ > [ > "\x{20ac}" > ] > ]; > > Secondly, don't believe everything you read on the net as fact and
don't Show quoted text
> take answers for C# as being the right answers for Perl and ODBC. The > ODBC specification supports unicode encoded in UCS2 as does MS SQL > Server and the MS SQL Server driver. The code I posted works and you > don't need any N prefix. > > I will drag out some more examples for you but it is going to make no > difference. If you bind your parameters as in "insert into mytable > values(?)" you only need to ensure they are bound as SQL_WCHARs and > DBD::ODBC will do that for you. Avoid unicode in SQL text but that
does Show quoted text
> mostly work too. > > Martin
unicode in sql example: use DBI; use strict; use warnings; use Data::Dumper; my $h = DBI->connect('dbi:ODBC:xxx','xx','xx'); eval {$h->do(q/drop table martin/);}; print "Has unicode: " . $h->{odbc_has_unicode} . "\n"; $h->do(q/create table martin (a nvarchar(100))/); my $s = $h->prepare(q/insert into martin values(?)/); $s->execute("\x{20ac}"); my $r = $h->selectall_arrayref(q/select * from martin/); print Dumper($r); my $sql = 'insert into martin values(' . $h->quote("\x{20ac}") . ')'; $h->do($sql); $r = $h->selectall_arrayref(q/select * from martin/); and output: Has unicode: 1 $VAR1 = [ [ "\x{20ac}" ] ]; $VAR1 = [ [ "\x{20ac}" ], [ "\x{20ac}" ] ]; Notice no N - it is not required. Martin -- Martin J. Evans Wetherby, UK
From: starrychloe [...] oliveyou.net
Yes, your example did work, and it did NOT work for the MySQL driver, despite 'set names utf8' being called. I believe the error is with the MySQL driver and will enter a bug there instead. I was confused because I displayed the hex values of the string inside the MSSQL Studio itself using select master.dbo.fn_varbintohexstr(cast(a as varbinary)), a from mje; and the é is stored as E9 in MSSQL yet stored as C3 A9 in a text file and in MySQL. That was odd but I suppose it is ok. I believe it is because MSSQL is using UTF16 instead of UTF8. When updating or inserting C3 A9 into MSSQL, I expected it to recognize the é and display é inside MSSQL Studio. Now I realize Perl should not have received the bytes C3 A9 in the first place, and the hex values inside Perl should have been E9 as well, with the Unicode bit set on the string.