Subject: | from_to affecting COW strings |
The from_to() function is documented as modifying strings in place. This leads to unusual behaviour when COW is involved:
$ perl -MTest::More -e'use utf8; use Encode; my $x = "täst"; is($x, "täst"); my $y = $x; Encode::from_to($y, "UTF-8", "iso-8859-1"); is($x, "täst"); done_testing'
ok 1
not ok 2
and if I'm reading the output correctly here, it's left the original scalar in an inconsistent state:
$ perl -MDevel::Peek -e'use utf8; use Encode; my $x = "täst"; Dump($x); my $y = $x; Encode::from_to($y, "UTF-8", "iso-8859-1"); Dump($x);'
SV = PV(0x222ed70) at 0x224e580
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x22523b0 "t\303\244st"\0 [UTF8 "t\x{e4}st"]
CUR = 5
LEN = 10
COW_REFCNT = 1
SV = PV(0x222ed70) at 0x224e580
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x22523b0 "t\344st\0"\0Malformed UTF-8 character (unexpected non-continuation byte 0x73, immediately after start byte 0xe4) in Dump at -e line 1.
[UTF8 "t\x{0}\x{0}"]
CUR = 5
LEN = 10
COW_REFCNT = 1
I think making a copy of the string first if COW_REFCNT > 1 would be less surprising. However, if the current behaviour is intentional, would it be possible to include a note in the documentation to highlight this?
cheers,
Tom