Subject: | Accented characters broken (U+007F -- U+00FF) on non-latin1 systems |
Object's properties returned as Latin1 byte strings, not unicode
characters strings even if CP set to CP_UTF8. This happens only if value
contains only unicode characters with codes <= 0x00FF.
To reproduce, try $Excel->ActiveCell->{Value} =
$Excel->ActiveCell->{Value} with cell value containing some Latin1
characters, "Übergang" for example on a system where CP_ACP is not Latin1.
This strange behaviour is cause by a call to sv_utf8_downgrade(sv,
TRUE); in function sv_setbstr(...) (file OLE.xs). It turns perfectly
correct utf-8 string, returned from WideCharToMultibyte into string of
latin1 bytes if all character codes is not above 0xFF and turns utf8
flag off. By the way, there is no corresponding sv_utf8_upgrade in
property put path.
I'd suggest to remove that sv_utf8_downgrade(sv, TRUE) (i wonder who may
need such a strange behaviour?) or at least make it configurable (via
some Option for example).
The workaround is to call utf8::upgrade() on all values, returned by
Win32::OLE property gets.
Perl 5.8.8
$Win32::OLE::VERSION = '01707'
Win 2000SP4 rus
MS Excel 2000