Skip Menu |

This queue is for tickets about the SOAP-Lite CPAN distribution.

Report information
The Basics
Id: 30271
Status: open
Priority: 0/
Queue: SOAP-Lite

People
Owner: Nobody in particular
Requestors: cmanley [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 0.69
  • 0.70_01
Fixed in: (no value)



Subject: Don't give strings with utf8 flag set to MIME::Base64::encode_base64().
In these 2 methods, there is no checking to see if the given string is utf8 flagged or not: SOAP::XMLSchema1999::Serializer::as_base64 SOAP::XMLSchema2001::Serializer::as_base64Binary I had a few cases where a utf8 flagged string containing a euro symbol was passed causing MIME::Base64::encode_base64 to die (rightfully because it is meant to encode octets only). The solution is to add this code just before MIME::Base64::encode_base64() is called in order to turn the utf8 flag off: require Encode; if (Encode::is_utf8($value)) { if (Encode->can('_utf8_off')) { # the quick way, but it may change in future Perl versions. Encode::_utf8_off($value); } else { $value = pack('C*',unpack('C*',$value)); # the slow but safe way, but this fallback works always. } } The (dirty) workaround for those of you who can't wait for this to be fixed is to place this code in your SOAP server just below the 'use' clauses: # First of all inject some patches into broken SOAP::Lite modules. # This is dirty symbol table hack. It works for now, but remove this when SOAP::Lite has been fixed. if ($SOAP::Lite::VERSION <= 0.70) { if (UNIVERSAL::can('SOAP::XMLSchema2001::Serializer', 'as_base64Binary')) { my $origsub = \&SOAP::XMLSchema2001::Serializer::as_base64Binary; *SOAP::XMLSchema2001::Serializer::as_base64Binary = sub { my $self = shift; my($value, $name, $type, $attr) = @_; # Base64 encoding only makes sense for octal characters, so to prevent # MIME::Base64::encode_base64() from rightfully croaking when given a utf8 # flagged string to encode, remove the utf8 flag of the string so that it # is treated as a string of bytes (even though it's not). require Encode; if (Encode::is_utf8($value)) { if (Encode->can('_utf8_off')) { # the quick way, but it may change in future Perl versions. Encode::_utf8_off($value); } else { $value = pack('C*',unpack('C*',$value)); # the slow but safe way, but this fallback works always. } } return &$origsub($self, $value, $name, $type, $attr); } } if (UNIVERSAL::can('SOAP::XMLSchema1999::Serializer', 'as_base64')) { my $origsub = \&SOAP::XMLSchema1999::Serializer::as_base64; *SOAP::XMLSchema1999::Serializer::as_base64 = sub { my $self = shift; my($value, $name, $type, $attr) = @_; # Base64 encoding only makes sense for octal characters, so to prevent # MIME::Base64::encode_base64() from rightfully croaking when given a utf8 # flagged string to encode, remove the utf8 flag of the string so that it # is treated as a string of bytes (even though it's not). require Encode; if (Encode::is_utf8($value)) { if (Encode->can('_utf8_off')) { # the quick way, but it may change in future Perl versions. Encode::_utf8_off($value); } else { $value = pack('C*',unpack('C*',$value)); # the slow but safe way, but this fallback works always. } } return &$origsub($self, $value, $name, $type, $attr); } } }
Applied the following in CVS: # Fixes #30271 for 5.8 and above. # Won't fix for 5.6 and below - perl can't handle unicode before # 5.8, and applying pack() to everything is just a slowdown. if (eval "require Encode; 1") { if (Encode::is_utf8($value)) { if (Encode->can('_utf8_off')) { # the quick way, but it may change in future Perl versions. Encode::_utf8_off($value); } else { $value = pack('C*',unpack('C*',$value)); # the slow but safe way, # but this fallback works always. } } } This should fix the issue for perl5.8 and above, while leaving 5.6 and below alone. There's no use in porting unicode support to 5.6, as perl itself does not support it in <5.8 Thanks, Martin
From: victor [...] vsespb.ru
I think such code introduce small inconsistency: Latin1 downgraded strings auto-converted by perl to Unicode when output to external world. So two strings that are equal can be encoded different way by this method. (example below). The correct solution would be ask user to specify encoding or specify "BINARY" encoding when calling this method. use utf8; use Encode; use Devel::Peek; my $s1 = "\xB5"; my $s2 = decode("UTF-8", "\xC2\xB5"); print "MATCH!\n" if $s1 eq $s2; Dump $s1; Dump $s2; __END__ MATCH! SV = PV(0x16d1b78) at 0x16fcb28 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x16efd90 "\265"\0 CUR = 1 LEN = 8 SV = PV(0x16d2038) at 0x16fcbe8 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x17bdb40 "\302\265"\0 [UTF8 "\x{b5}"] CUR = 2 LEN = 8 On Mon Oct 29 01:42:06 2007, MKUTTER wrote: Show quoted text
> Applied the following in CVS: > > # Fixes #30271 for 5.8 and above. > # Won't fix for 5.6 and below - perl can't handle unicode before > # 5.8, and applying pack() to everything is just a slowdown. > if (eval "require Encode; 1") { > if (Encode::is_utf8($value)) { > if (Encode->can('_utf8_off')) { # the quick way, but it may > change in future Perl versions. > Encode::_utf8_off($value); > } > else { > $value = pack('C*',unpack('C*',$value)); # the slow but > safe way, > # but this fallback works always. > } > } > } > > > This should fix the issue for perl5.8 and above, while leaving 5.6 and > below alone. There's no use in porting unicode support to 5.6, as perl > itself does not support it in <5.8 > > Thanks, > > Martin
Hello Victor, the issue you responded to has been closed for almost 6 years now. Do you have a real-world problem with SOAP::Lite's behavior (in the latest version)? In this case, please attach some sample code demonstrating the error when using SOAP::Lite. I'll close the ticket again - it'll be reopened automatically in case you reply to this mail. Best regards, Martin Am Mi 14. Aug 2013, 19:29:42, vsespb schrieb: Show quoted text
> I think such code introduce small inconsistency: > > Latin1 downgraded strings auto-converted by perl to Unicode when > output to external world. > > So two strings that are equal can be encoded different way by this > method. (example below). > > The correct solution would be ask user to specify encoding or specify > "BINARY" encoding when calling this method. > > use utf8; > use Encode; > use Devel::Peek; > > my $s1 = "\xB5"; > my $s2 = decode("UTF-8", "\xC2\xB5"); > > print "MATCH!\n" if $s1 eq $s2; > > Dump $s1; > Dump $s2; > __END__ > > MATCH! > SV = PV(0x16d1b78) at 0x16fcb28 > REFCNT = 1 > FLAGS = (PADMY,POK,pPOK) > PV = 0x16efd90 "\265"\0 > CUR = 1 > LEN = 8 > SV = PV(0x16d2038) at 0x16fcbe8 > REFCNT = 1 > FLAGS = (PADMY,POK,pPOK,UTF8) > PV = 0x17bdb40 "\302\265"\0 [UTF8 "\x{b5}"] > CUR = 2 > LEN = 8 > > > > On Mon Oct 29 01:42:06 2007, MKUTTER wrote:
> > Applied the following in CVS: > > > > # Fixes #30271 for 5.8 and above. > > # Won't fix for 5.6 and below - perl can't handle unicode before > > # 5.8, and applying pack() to everything is just a slowdown. > > if (eval "require Encode; 1") { > > if (Encode::is_utf8($value)) { > > if (Encode->can('_utf8_off')) { # the quick way, but it may > > change in future Perl versions. > > Encode::_utf8_off($value); > > } > > else { > > $value = pack('C*',unpack('C*',$value)); # the slow but > > safe way, > > # but this fallback works always. > > } > > } > > } > > > > > > This should fix the issue for perl5.8 and above, while leaving 5.6 > > and > > below alone. There's no use in porting unicode support to 5.6, as > > perl > > itself does not support it in <5.8 > > > > Thanks, > > > > Martin
From: victor [...] vsespb.ru
use SOAP::Lite 'trace', 'debug'; use Encode; use strict; use warnings; use Digest::SHA qw/sha1_hex/; use utf8; my $soap = SOAP::Lite->new( proxy => 'https://176.58.110.134/soap-wsdl-test/helloworld.pl'); $soap->default_ns('urn:HelloWorld'); my $s = decode("UTF-8", "\xC2\xB5"); my $digest = sha1_hex($s); # TRY COMMENT THIS OUT my $som = $soap->call('sayHello', 'Kutter', $s); die $som->faultstring if ($som->fault); print $som->result, "\n"; __END__ if sends different data to remote server - either <c-gensym5 xsi:type="xsd:base64Binary">tQ==</c-gensym5> or <c-gensym5 xsi:type="xsd:base64Binary">wrU=</c-gensym5> depending on whenever sha1_hex() call commented out or no. so sha1_hex() call has a side effect on what is actually send to remote server, while it should not. Note 1: that Digest::SHA document this behaviour: Show quoted text
> Be aware that the digest routines silently convert UTF-8 input into its > equivalent byte sequence in the native encoding (cf. utf8::downgrade). This > side effect influences only the way Perl stores the data internally, but > otherwise leaves the actual value of the data intact.
Similar behaviour is observed in JSON module: Show quoted text
> It will also try to downgrade any strings to octet-form if possible: perl > stores strings internally either in an encoding called UTF-X or in > octet-form. The latter cannot store everything but uses less space in > general (and some buggy Perl or C code might even rely on that internal > representation being used).
note the words "some buggy Perl or C" Note 2: According to perl unicode specifications, any 3rd party code can downgrade strings, without advertising it in documentation. Note 3: Those are example of character string silently downgraded, but same way binary string can be silently upgraded to string with UTF-8 bit set (and remain binary string after that) Note 4: if you print $s to output file, _with_ or _without_ specifying encoding, it will be always same, independent of sha1_hex call. Note 5: This problem does not affect me directly, so if you prefer fix code, only if some real user have real problems with module, it's not the case yet. This just is a proof-of-concept of possible problems. On Thu Aug 15 13:54:54 2013, MKUTTER wrote: Show quoted text
> Hello Victor, > > the issue you responded to has been closed for almost 6 years now. > > Do you have a real-world problem with SOAP::Lite's behavior (in the > latest version)? > > In this case, please attach some sample code demonstrating the error > when using SOAP::Lite. > > I'll close the ticket again - it'll be reopened automatically in case > you reply to this mail. > > Best regards, > > Martin > > Am Mi 14. Aug 2013, 19:29:42, vsespb schrieb:
> > I think such code introduce small inconsistency: > > > > Latin1 downgraded strings auto-converted by perl to Unicode when > > output to external world. > > > > So two strings that are equal can be encoded different way by this > > method. (example below). > > > > The correct solution would be ask user to specify encoding or specify > > "BINARY" encoding when calling this method. > > > > use utf8; > > use Encode; > > use Devel::Peek; > > > > my $s1 = "\xB5"; > > my $s2 = decode("UTF-8", "\xC2\xB5"); > > > > print "MATCH!\n" if $s1 eq $s2; > > > > Dump $s1; > > Dump $s2; > > __END__ > > > > MATCH! > > SV = PV(0x16d1b78) at 0x16fcb28 > > REFCNT = 1 > > FLAGS = (PADMY,POK,pPOK) > > PV = 0x16efd90 "\265"\0 > > CUR = 1 > > LEN = 8 > > SV = PV(0x16d2038) at 0x16fcbe8 > > REFCNT = 1 > > FLAGS = (PADMY,POK,pPOK,UTF8) > > PV = 0x17bdb40 "\302\265"\0 [UTF8 "\x{b5}"] > > CUR = 2 > > LEN = 8 > > > > > > > > On Mon Oct 29 01:42:06 2007, MKUTTER wrote:
> > > Applied the following in CVS: > > > > > > # Fixes #30271 for 5.8 and above. > > > # Won't fix for 5.6 and below - perl can't handle unicode before > > > # 5.8, and applying pack() to everything is just a slowdown. > > > if (eval "require Encode; 1") { > > > if (Encode::is_utf8($value)) { > > > if (Encode->can('_utf8_off')) { # the quick way, but it > > > may > > > change in future Perl versions. > > > Encode::_utf8_off($value); > > > } > > > else { > > > $value = pack('C*',unpack('C*',$value)); # the slow > > > but > > > safe way, > > > # but this fallback works always. > > > } > > > } > > > } > > > > > > > > > This should fix the issue for perl5.8 and above, while leaving 5.6 > > > and > > > below alone. There's no use in porting unicode support to 5.6, as > > > perl > > > itself does not support it in <5.8 > > > > > > Thanks, > > > > > > Martin
Subject: [rt.cpan.org #30271]
Date: Sun, 08 Mar 2015 14:20:33 +0100
To: bug-SOAP-Lite [...] rt.cpan.org
From: Christian Huldt <christian [...] solvare.se>
Maybe just add $content = SOAP::Data->type(string => $content); if $content is utf8? That way it won't be fed to MIME::Base64....