Subject: | utf8 handling bug |
Date: | Fri, 07 Jun 2013 11:09:26 +0200 |
To: | bug-Wx [...] rt.cpan.org |
From: | Jiří Pavlovský <jiri [...] pavlovsky.eu> |
Hi,
I had a following problem:
I pass an array of object with stringification overload. Combobox
displays stringified values and returns selected object. Works great
unless the stringified value contains accented characters. Then the
displayed value is messed up.
I thought I located the problem to a bug in stringification/utf8 and
reported it to perl-bug.
But I got a reply suggesting it's a bug in Wx::Perl. See for details below:
I think this is a bug in Wx::Perl.
I just downloaded Wx-0.9922 from CPAN and did a quick scan.
cpp/helpers.cpp contains this, which I assume is a utility function used
by various parts of Wx::Perl:
#if wxUSE_UNICODE
static wxChar* wxPli_copy_string( SV* scalar, wxChar** )
{
dTHX;
STRLEN length;
wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ?
wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ) :
wxWCharBuffer( wxString( SvPV( scalar, length ),
wxConvLocal ).wc_str() );
wxChar* buffer = new wxChar[length + 1];
memcpy( buffer, tmp.data(), length * sizeof(wxChar) );
buffer[length] = wxT('\0');
return buffer;
}
#endif
Checking SvUTF8(scalar) before any stringification is incorrect. What
it should be doing is something like this:
dTHX;
STRLEN length;
char * const s = SvPV( scalar, length );
wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ?
wxConvUTF8.cMB2WX( s ) :
wxWCharBuffer( wxString( s,
wxConvLocal ).wc_str() );
I don’t know what the wxConvLocal does, but if it does anything other
than treat the string as Latin1, then that is also incorrect, and this
would be better:
dTHX;
STRLEN length;
wxWCharBuffer tmp =
wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) );
This aspect of SvUTF8 is nothing new, as has been documented since 2006
(commit cd028baaa4):
SvUTF8 Returns a U32 value indicating the UTF-8 status of an SV. If
things are set-up properly, this indicates whether or not the
SV contains UTF-8 encoded data. You should use this after a
call to SvPV() or one of its variants, in case any call to
string overloading updates the internal flag.
(The current wording is of recent provenance and comes from commit
fd1423831.)
I don’t know enough about Wx to write a test case, so could you report
this to bug-Wx@rt.cpan.org?
--
Jiří Pavlovský