Subject: | UTF8 Strings Not Marked as UTF8 If Base64 encoded |
Date: | Tue, 05 Feb 2008 12:20:02 -0800 |
To: | bug-SOAP-Lite [...] rt.cpan.org |
From: | Greg Wittel <gwittel [...] proofpoint.com> |
Tried on SOAP::Lite 0.70_4.
If a UTF8 string is subjected to base64 encoding (See RT Bug# 30271 ;
http://rt.cpan.org/Public/Bug/Display.html?id=30271), the deserialized data
does not have its is_utf8 bits set. This means the client gets octets back
rather than a string as expected.
Based on Bug# 30721 there are 2 ways to fix this:
1) Fix data type detection so that UTF8 data is not detected as binary and
sent to base64 encoding:
In SOAP::Serializer change:
_typelookup => {
'base64binary' => [10, sub { $_[0] =~ ...}, ... ]
To (adding the appropriate 'use' statements):
_typelookup => {
'base64binary' => [10, sub { ( !
Encode::is_utf8($_[0]) ) && $_[0] =~ .... }, ... ]
This assumes that transport charset is UTF8. Not sure what happens if
its not.
2) Create a data type 'utf8base64' and properly encode/decode it.
The expected behavior should be equivalent to:
Serialize: encode_base64( Encode::encode(...) )
De-Serialized: Encode::decode(decode_base64() ... )
This method would be less sensitive to transport charset, but I'm
guessing that this would cause interop problems.
-Greg