Skip Menu |

This queue is for tickets about the RPC-XML CPAN distribution.

Report information
The Basics
Id: 34472
Status: open
Priority: 0/
Queue: RPC-XML

People
Owner: rjray [...] blackperl.com
Requestors: mods [...] hank.org
Cc: BOBKARE [...] cpan.org
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.59
Fixed in: (no value)



Subject: RPC::XML::Client fails to encode
Although $RPC::XML::ENCODING will specify the encoding of the xml payload, the content is never actually encocde when sending in the Client. Instead of $content = $req->as_string; Perhaps: $content = encode( $RPC::XML::ENCODING, $req->as_string, $CHECK ); (for some value of $CHECK)
On Thu Mar 27 10:22:27 2008, HANK wrote: Show quoted text
> Although $RPC::XML::ENCODING will specify the encoding of the xml > payload, the content is never actually encocde when sending in the Client. > > Instead of > > $content = $req->as_string; > > Perhaps: > > $content = encode( $RPC::XML::ENCODING, $req->as_string, $CHECK ); > (for some value of $CHECK)
Could you provide an example wherein a client sends a message that does not properly decode on a server? My understanding of other encodings is fairly limited... I was under the impression that I could generate the XML tag-soup generically, and that as long as the content the user provided followed encodings, it would be fine. But others have written about this problem before, so I am quite obviously wrong :-).
From: mods [...] hank.org
On Sun Mar 30 09:10:22 2008, RJRAY wrote: Show quoted text
>
> > > > $content = encode( $RPC::XML::ENCODING, $req->as_string, $CHECK ); > > (for some value of $CHECK)
> > Could you provide an example wherein a client sends a message that does > not properly decode on a server? My understanding of other encodings is > fairly limited... I was under the impression that I could generate the > XML tag-soup generically, and that as long as the content the user > provided followed encodings, it would be fine. But others have written > about this problem before, so I am quite obviously wrong :-).
It's not that XML::Parser will fail. It's only an issue with encoding and Perl. Inside Perl we work with characters, outside we work with octets. When reading octets into Perl we decode (XML::Parser does this) When sending chars out of Perl we must encode back into octets. The problem is if a request content contains utf8 *characters* then in Perl the scalar will have the utf8 flag set. Then when sending the characters out of perl (e.g. with LWP) we need to encode back to octets. If we do not we will get the "Wide Character in Print" type of error. The bigger problem is that the Content-Length headers will be wrong: $ perl -le 'print length("\x{1234}")' 1 (yes, one character) $ perl -MEncode -le 'print length(encode_utf8("\x{1234}"))' 3 (yes, three octets) I would argue that LWP should look at the HTTP::Request object and encode() based on the charset defined (or 8859-1 if not defined). But, it doesn't. How to test? moseley@bumby2:~/RPC-XML-0.59$ diff t/50_client.t.orig t/50_client.t 47a48 Show quoted text
>
65c66 < ok($cli->simple_request('system.identity') eq $srv->product_tokens); --- Show quoted text
> ok($cli->simple_request('system.identity', "\x{1234}") eq
$srv->product_tokens); moseley@bumby2:~/RPC-XML-0.59$ perl -W -Iblib/lib t/50_client.t Content-Length header value was wrong, fixed at /usr/share/perl5/LWP/Protocol/http.pm line 191.
OK, this has suddenly become a front-and-center issue, thanks to a change in HTTP::Message in the 5.810 version of LWP. It now croaks when passed anything that isn't bytes as the content for the body. I'm talking to Gisle Aas about it (and about why he chose "croak" over "carp" without warning people first!), but it seems my best bet is to fix things on my end and let anyone else who's written apps around LWP worry about their own concerns. Expect to see this in the 0.61 release, which will be sometime in the next week or so, as I am being deluged by cpan-testers automatically-mailed reports. And they're pretty much all the exact same report.
How about implementing what Bill Moseley suggested? Current implementation, calling utf8::downgrade implicitly converts data to latin1 encoding. (utf8::downgrade has same semantics as $octets = encode('latin1', $chars);). It is a good idea to use Encoding::encode to actually encode data. But if this impossible for some reason, as quick fix replacing 'utf8::downgrade' with 'utf8::encode' will helps much. In current implementation with utf8::downgrade, if string contains some unicode characters, which does not map to latin1 (character codes \x{100} and above) just dies. (You can try to send some string with non latin1 chars, for example the word "hello" in Russian "\x{41f}\x{440}\x{438}\x{432}\x{435}\x{442}", and RPC::XML will die on calling utf8::downgrade). With utf8::encode, every characters which is non us-ascii will be encoded in utf8. utf8::encode will never fail, because every unicode character can be expressed in utf8. Thank you in advance!
Actually it is easy to work around. I don't like to modify something in other package namespace, but I have to do it to modify $ENCODING. So if I'm modifying ENCODING aleady, I can modify utf8_downgrade as well. So adding this code after use/require of RPC::XML will enable utf8 in RPC::XML. { no warnings 'redefine'; $RPC::XML::ENCODING = 'utf-8'; *RPC::XML::utf8_downgrade = \&utf8::encode; }
+1 for this bug report. This is something actually broken, because the module should take care of encoding strings to the declared character set/encoding. The workaround provided by YCAR is nice and I highly encourage people to use it as well as the maintainer to patch his distribution. I added a note about this in AnnoCPAN. Randy, if you want some help in maintaining your module (that I use a lot in my applications) drop me a line.
Hi I think I have a proper fix for encoding, see attached patch. Also attached is a pretty basic test case that checks that UTF-8 encoding works fine for strings, and I have tested that some internal client and server apps here at work now correctly handle data with snowmen when encoding is set to UTF-8. There are a couple more calls to utf8_downgrade in RPC::XML::Server and RPC::XML::Client that should no longer be needed, but I didn't do enough tests on that to be sure, and it shouldn't do any harm anyway. -- Knut Arne Bjørndal, Easy Connect AS bobkare@cpan.org, knut.arne.bjorndal@easyconnect.no
Subject: encoding.patch
diff --git a/lib/RPC/XML.pm b/lib/RPC/XML.pm index b2eda88..874041e 100644 --- a/lib/RPC/XML.pm +++ b/lib/RPC/XML.pm @@ -30,6 +30,7 @@ use vars qw(@EXPORT_OK %EXPORT_TAGS $VERSION $ERROR use subs qw(time2iso8601 smart_encode utf8_downgrade); use base 'Exporter'; +use Encode; use Scalar::Util qw(blessed reftype); ## no critic (ProhibitSubroutinePrototypes) @@ -385,7 +386,7 @@ sub as_string substr $class, 0, 8, 'dateTime'; } - return "<$class>$$self</$class>"; + return Encode::encode($RPC::XML::ENCODING, "<$class>$$self</$class>", Encode::FB_CROAK); } # Serialization for simple types is just a matter of sending as_string over @@ -394,7 +395,6 @@ sub serialize my ($self, $fh) = @_; my $str = $self->as_string; - RPC::XML::utf8_downgrade($str); print {$fh} $str; return; @@ -406,7 +406,7 @@ sub length ## no critic (ProhibitBuiltinHomonyms) { my $self = shift; - RPC::XML::utf8_downgrade(my $str = $self->as_string); + my $str = $self->as_string; return length $str; } @@ -502,6 +502,8 @@ sub as_string ($value = defined ${$self} ? ${$self} : q{} ) =~ s/$RPC::XML::XMLRE/$RPC::XML::XMLMAP{$1}/ge; + $value = Encode::encode($RPC::XML::ENCODING, $value, Encode::FB_CROAK); + return "<$class>$value</$class>"; } @@ -799,6 +801,7 @@ sub as_string for (keys %{$self}) { ($key = $_) =~ s/$RPC::XML::XMLRE/$RPC::XML::XMLMAP{$1}/ge; + $key = Encode::encode($RPC::XML::ENCODING, $key, Encode::FB_CROAK); $clean{$key} = $self->{$_}->as_string; } @@ -823,7 +826,7 @@ sub serialize for (keys %{$self}) { ($key = $_) =~ s/$RPC::XML::XMLRE/$RPC::XML::XMLMAP{$1}/ge; - RPC::XML::utf8_downgrade($key); + $key = Encode::encode($RPC::XML::ENCODING, $key, Encode::FB_CROAK); print {$fh} "<member><name>$key</name><value>"; $self->{$_}->serialize($fh); print {$fh} '</value></member>'; @@ -843,7 +846,7 @@ sub length ## no critic (ProhibitBuiltinHomonyms) { $len += 45; # For all the constant XML presence $len += $self->{$key}->length; - RPC::XML::utf8_downgrade($key); + $key = Encode::encode($RPC::XML::ENCODING, $key, Encode::FB_CROAK); $len += length $key; } @@ -1357,7 +1360,7 @@ sub serialize { my ($self, $fh) = @_; my $name = $self->{name}; - RPC::XML::utf8_downgrade($name); + $name = Encode::encode($RPC::XML::ENCODING, $name, Encode::FB_CROAK); print {$fh} qq(<?xml version="1.0" encoding="$RPC::XML::ENCODING"?>); diff --git a/t/utf8.t b/t/utf8.t index e69de29..27bcd34 100755 --- a/t/utf8.t +++ b/t/utf8.t @@ -0,0 +1,23 @@ +#!/usr/bin/perl +use strict; +use warnings; + +# Test UTF-8 encoding in RPC::XML + +use Test::More tests => 2; +use Encode; + +use RPC::XML; +$RPC::XML::ENCODING = 'UTF-8'; + +# \x{2603}\x{2602} is a snowman with umbrella +my $data_ref = RPC::XML::struct->new( + simplekey => "\x{2603}\x{2602}", + "\x{2603}\x{2602}" => 'simplevalue', + "\x{2603}\x{2602}\x{2603}\x{2602}" => "\x{2603}\x{2602}\x{2603}\x{2602}", +); + +my $encoded = $data_ref->as_string; + +is($data_ref->length, 259, 'Length is correct'); +ok(Encode::decode('UTF-8', $encoded, Encode::FB_CROAK), 'decode from UTF-8 works');
See also https://github.com/rjray/rpc-xml/pull/5 for an alternative patch to this problem. Regards Racke