Subject: | COW breakage with _utf8_on() |
Date: | Thu, 29 Aug 2013 11:00:31 +0100 |
To: | bug-Encode [...] rt.cpan.org |
From: | Zefram <zefram [...] fysh.org> |
Functions that side-effect a scalar, such as Encode::_utf8_on(), need
to de-COW the operand. See [perl #79824] for the origins of this bug
report; the bug has been fixed for core functions such as utf8::decode().
Recipe to reproduce problem:
$ perl -MEncode -lwe '%a=("L\x{c3}\x{a9}on"=>"acme"); ($k)=(keys %a); Encode::_utf8_on($k); %h = ($k => "acme"); print $h{"L\x{e9}on"}'
Use of uninitialized value in print at -e line 1.
For the purposes of this bug report, the string being _utf8_on-ed is
always well-formed UTF-8, so the big documented caveat about _utf8_on
doesn't apply. What happens here is that $k, having come from keys(),
shares its PV buffer with the HEK in %a, the _utf8_on doesn't touch
the PV, and when $k is later used as a hash key the hash value already
computed for that PV is reused. But _utf8_on has changed the hash value
of the scalar, by changing which character sequence it represents. So %h
ends up with its hash key stored under the wrong hash value, hence in the
wrong bucket, hence looking up by an independent copy of the key fails.
-zefram