Skip Menu |

This queue is for tickets about the Wx CPAN distribution.

Report information
The Basics
Id: 85943
Status: open
Priority: 0/
Queue: Wx

People
Owner: Nobody in particular
Requestors: jiri [...] pavlovsky.eu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: utf8 handling bug
Date: Fri, 07 Jun 2013 11:09:26 +0200
To: bug-Wx [...] rt.cpan.org
From: Jiří Pavlovský <jiri [...] pavlovsky.eu>
Hi, I had a following problem: I pass an array of object with stringification overload. Combobox displays stringified values and returns selected object. Works great unless the stringified value contains accented characters. Then the displayed value is messed up. I thought I located the problem to a bug in stringification/utf8 and reported it to perl-bug. But I got a reply suggesting it's a bug in Wx::Perl. See for details below: I think this is a bug in Wx::Perl. I just downloaded Wx-0.9922 from CPAN and did a quick scan. cpp/helpers.cpp contains this, which I assume is a utility function used by various parts of Wx::Perl: #if wxUSE_UNICODE static wxChar* wxPli_copy_string( SV* scalar, wxChar** ) { dTHX; STRLEN length; wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ? wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ) : wxWCharBuffer( wxString( SvPV( scalar, length ), wxConvLocal ).wc_str() ); wxChar* buffer = new wxChar[length + 1]; memcpy( buffer, tmp.data(), length * sizeof(wxChar) ); buffer[length] = wxT('\0'); return buffer; } #endif Checking SvUTF8(scalar) before any stringification is incorrect. What it should be doing is something like this: dTHX; STRLEN length; char * const s = SvPV( scalar, length ); wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ? wxConvUTF8.cMB2WX( s ) : wxWCharBuffer( wxString( s, wxConvLocal ).wc_str() ); I don’t know what the wxConvLocal does, but if it does anything other than treat the string as Latin1, then that is also incorrect, and this would be better: dTHX; STRLEN length; wxWCharBuffer tmp = wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ); This aspect of SvUTF8 is nothing new, as has been documented since 2006 (commit cd028baaa4): SvUTF8 Returns a U32 value indicating the UTF-8 status of an SV. If things are set-up properly, this indicates whether or not the SV contains UTF-8 encoded data. You should use this after a call to SvPV() or one of its variants, in case any call to string overloading updates the internal flag. (The current wording is of recent provenance and comes from commit fd1423831.) I don’t know enough about Wx to write a test case, so could you report this to bug-Wx@rt.cpan.org? -- Jiří Pavlovský
Subject: Re: [rt.cpan.org #85943] utf8 handling bug
Date: Fri, 07 Jun 2013 14:44:26 +0100
To: bug-Wx [...] rt.cpan.org
From: mdootson <mdootson [...] cpan.org>
Hi, Thanks for the report. wxPli_copy_string is used in one place only and that is parsing command line arguments during wxWidgets initialisation. So, unless you are passing in your values on the command line, this particular helper function cannot be your problem. Recently I did look at some of the wxPerl utf8 handling and after some help on the wxPerl mailing list implemented a change for Wx 0.9922. One thing that was made clear during the process is that the Perl docs concerning utf8 handling are confused and in some places simply wrong. That being the case, I think I need clear test cases to demonstrate any utf8 related bug that I can then test / fix. For information, the code that converts your text for use by wxWidgets is the macro WXSTRING_INPUT which is in cpp/helpers.h at line 78 or 109 in the Wx 0.9922 source. Note that this changed for version Wx 0.9922. In previous versions the code looked more like your example from wxPli_copy_string. If I understand your report, are you saying that if an array of values passed to a Wx::ComboBox constructor contains members with valid UTF-8 and multi-byte characters, these characters are displayed incorrectly? If yes, I can construct my own test case for this. If not, I'll probably need some code from you that demonstrates the problem. Regards Mark On 07/06/2013 10:09, Jiří Pavlovský via RT wrote: Show quoted text
> Fri Jun 07 05:09:50 2013: Request 85943 was acted upon. > Transaction: Ticket created by jiri@pavlovsky.eu > Queue: Wx > Subject: utf8 handling bug > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: jiri@pavlovsky.eu > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=85943 > > > > Hi, > > I had a following problem: > > I pass an array of object with stringification overload. Combobox > displays stringified values and returns selected object. Works great > unless the stringified value contains accented characters. Then the > displayed value is messed up. > > I thought I located the problem to a bug in stringification/utf8 and > reported it to perl-bug. > > But I got a reply suggesting it's a bug in Wx::Perl. See for details below: > > I think this is a bug in Wx::Perl. > > I just downloaded Wx-0.9922 from CPAN and did a quick scan. > cpp/helpers.cpp contains this, which I assume is a utility function used > by various parts of Wx::Perl: > > #if wxUSE_UNICODE > static wxChar* wxPli_copy_string( SV* scalar, wxChar** ) > { > dTHX; > STRLEN length; > wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ? > wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ) : > wxWCharBuffer( wxString( SvPV( scalar, length ), > wxConvLocal ).wc_str() ); > > wxChar* buffer = new wxChar[length + 1]; > memcpy( buffer, tmp.data(), length * sizeof(wxChar) ); > buffer[length] = wxT('\0'); > return buffer; > } > #endif > > Checking SvUTF8(scalar) before any stringification is incorrect. What > it should be doing is something like this: > > dTHX; > STRLEN length; > char * const s = SvPV( scalar, length ); > wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ? > wxConvUTF8.cMB2WX( s ) : > wxWCharBuffer( wxString( s, > wxConvLocal ).wc_str() ); > > I don’t know what the wxConvLocal does, but if it does anything other > than treat the string as Latin1, then that is also incorrect, and this > would be better: > > dTHX; > STRLEN length; > wxWCharBuffer tmp = > wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ); > > > This aspect of SvUTF8 is nothing new, as has been documented since 2006 > (commit cd028baaa4): > > SvUTF8 Returns a U32 value indicating the UTF-8 status of an SV. If > things are set-up properly, this indicates whether or not the > SV contains UTF-8 encoded data. You should use this after a > call to SvPV() or one of its variants, in case any call to > string overloading updates the internal flag. > > (The current wording is of recent provenance and comes from commit > fd1423831.) > > I don’t know enough about Wx to write a test case, so could you report > this to bug-Wx@rt.cpan.org? > > >