Subject: | serializing multibyte characters |
Date: | Sat, 10 Jan 2009 16:31:46 +0300 |
To: | bug-PHP-Serialization [...] rt.cpan.org |
From: | "Alexey Makeev" <therealal [...] gmail.com> |
Hello.
It seems PHP uses length of string in bytes (not characters) during
serialization:
$ php -r '$s = chop(fgets(STDIN)); print serialize($s);'
.А.
becomes s:4:".А."; (where A - is russian A in UTF-8, 2 bytes),
but perl's function length is a length of string in characters, not bytes, so
$ perl -e 'use utf8; use PHP::Serialization (); print
PHP::Serialization::serialize(".А."), "\n";'
becomes s:3:".А.";
(A in string above - russian, 2 bytes), so PHP cant deserialize it:
$ php -r 'print unserialize("s:3:\".А.\";");'
PHP Notice: unserialize(): Error at offset 8 of 11 bytes in Command
line code on line 1
I think solution is to use bytes in module:
package PHP::Serialization;
use strict;
use warnings;
use bytes;
BEGIN {
...
after this change:
$ perl -e 'use utf8; use PHP::Serialization (); print
PHP::Serialization::serialize(".А."), "\n";'
s:4:".А.";
and
$ php -r 'print unserialize("s:4:\".А.\";");'
.А.
Best regards,
Alexey Makeyev
therealal@gmail.com