Bug #42279 for PHP-Serialization: serializing multibyte characters

Sat Jan 10 08:32:13 2009 therealal [...] gmail.com - Ticket created

Subject:	serializing multibyte characters
Date:	Sat, 10 Jan 2009 16:31:46 +0300
To:	bug-PHP-Serialization [...] rt.cpan.org
From:	"Alexey Makeev" <therealal [...] gmail.com>

Hello. It seems PHP uses length of string in bytes (not characters) during serialization: $ php -r '$s = chop(fgets(STDIN)); print serialize($s);' .А. becomes s:4:".А."; (where A - is russian A in UTF-8, 2 bytes), but perl's function length is a length of string in characters, not bytes, so $ perl -e 'use utf8; use PHP::Serialization (); print PHP::Serialization::serialize(".А."), "\n";' becomes s:3:".А."; (A in string above - russian, 2 bytes), so PHP cant deserialize it: $ php -r 'print unserialize("s:3:\".А.\";");' PHP Notice: unserialize(): Error at offset 8 of 11 bytes in Command line code on line 1 I think solution is to use bytes in module: package PHP::Serialization; use strict; use warnings; use bytes; BEGIN { ... after this change: $ perl -e 'use utf8; use PHP::Serialization (); print PHP::Serialization::serialize(".А."), "\n";' s:4:".А."; and $ php -r 'print unserialize("s:4:\".А.\";");' .А. Best regards, Alexey Makeyev therealal@gmail.com

Sun Jan 11 06:18:52 2009 bobtfish [...] bobtfish.net - Correspondence added

Subject:	Re: [rt.cpan.org #42279] serializing multibyte characters
Date:	Sun, 11 Jan 2009 11:18:33 +0000
To:	bug-PHP-Serialization [...] rt.cpan.org
From:	Tomas Doran <bobtfish [...] bobtfish.net>

On 10 Jan 2009, at 13:32, Alexey Makeev via RT wrote: Show quoted text

> I think solution is to use bytes in module: > > package PHP::Serialization; > use strict; > use warnings; > use bytes; >

I've added this to 0.30, just went to CPAN. Cheers t0m

Sun Jan 11 06:18:52 2009 The RT System itself - Status changed from 'new' to 'open'

Sun Jan 11 06:19:31 2009 bobtfish [...] bobtfish.net - Taken

Sun Jan 11 06:22:40 2009 bobtfish [...] bobtfish.net - Status changed from 'open' to 'resolved'