As far as I'm overlooking this thematic I think that the the
deserializing and serializing should pay some attention to the content
charset.
I'm taking Catalyst::Action::Deserialize::YAML here as example but the
should apply to other text based (de-)serializers too.
Here's the code from 'Catalyst::Action::Deserialize::YAML':
eval {
my $body = $c->request->body;
$rdata = LoadFile( "$body" );
};
$body contains some kind of file handle that will deliver a octet
stream. So if you turn this stream into a perl data structure your data
contains only octets strings (0 <= ord() <= 255). You have to decode it
all by yourself in the application code.
The solution? The library should pay attention to the content encoding.
Here's the stuff in pseudo code:
1) Get the content encoding
2) Check if this encoding is supported by Encode (bail out otherwise?)
3) With the right encoding we can transfer the octet string into a perl
string
4) Deserialize the data
And everything is fine.
I've attached a complete dummy MyApp with a .t for this issue. I'm
sending a arabian unicode character and the test class on the others
side does a length() count on it and return this result in the body.
I've also written a quick and dirty patch for the YAML part. If you
interested I can expand this patch on the other text based formats. But
at first I want to hear your oppion about this issue.
Thanks for your work.