Subject: | Encode::encode has a side effect of turning on UTF8 flag in its argument |
The Encode::encode has a side effect of turning on the
UTF8 flag in its argument when given a plain non-UTF8 string.
This is not new to 2.51, same happens with 2.42 and probably
most if not all older versions.
The following program illustrates the issue:
use Encode;
my $enc_ascii = Encode::find_encoding('ascii');
my $s = 'abc';
printf("pre: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8');
my $octets = $enc_ascii->encode($s);
printf("post: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8');
The result:
pre: abc - not utf8
post: abc - UTF8
This side effect can cause surprises in an application which calls
the Encode::encode regardless of the UTF8 flag in a string, e.g. for
sanitation/logging purposes, where all of a sudden a plain bytes string
turns into an UTF8 string.
Choices:
- the Encode::encode should be fixed not to modify its arguments,
- or, the effect should be clearly and boldly documented,
- or, it should be documented that a not-UTF8 string should not
be given to Encode::encode (which is impractical, especially
in portable applications which cannot rely on Encode::is_utf8
which had a bug in old versions, returning false for a tainted
but UTF8-flagged string - [perl #32687])