Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 85489
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: Mark.Martinec [...] ijs.si
Cc: pali [...] cpan.org
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.51
Fixed in: 2.87



Subject: Encode::encode has a side effect of turning on UTF8 flag in its argument
The Encode::encode has a side effect of turning on the UTF8 flag in its argument when given a plain non-UTF8 string. This is not new to 2.51, same happens with 2.42 and probably most if not all older versions. The following program illustrates the issue: use Encode; my $enc_ascii = Encode::find_encoding('ascii'); my $s = 'abc'; printf("pre: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8'); my $octets = $enc_ascii->encode($s); printf("post: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8'); The result: pre: abc - not utf8 post: abc - UTF8 This side effect can cause surprises in an application which calls the Encode::encode regardless of the UTF8 flag in a string, e.g. for sanitation/logging purposes, where all of a sudden a plain bytes string turns into an UTF8 string. Choices: - the Encode::encode should be fixed not to modify its arguments, - or, the effect should be clearly and boldly documented, - or, it should be documented that a not-UTF8 string should not be given to Encode::encode (which is impractical, especially in portable applications which cannot rely on Encode::is_utf8 which had a bug in old versions, returning false for a tainted but UTF8-flagged string - [perl #32687])
From: victor [...] vsespb.ru
Does not happen if - my $octets = $enc_ascii->encode($s); + my $octets = encode($enc_ascii, $s); actually in this call $enc_ascii->encode($s) encode() sub is not Encode::encode and is in different package. documentation Show quoted text
> The returned object is what does the actual encoding or decoding.
.. Show quoted text
> is in fact
.. Show quoted text
> with more error checking.
probably is bit vague, about what is "error checking" here. I think Encode::encode actually implements code which ensures that input parameter is not modified. On Tue May 21 18:00:43 2013, Mark.Martinec@ijs.si wrote: Show quoted text
> The Encode::encode has a side effect of turning on the > UTF8 flag in its argument when given a plain non-UTF8 string. > > This is not new to 2.51, same happens with 2.42 and probably > most if not all older versions. > > The following program illustrates the issue: > > use Encode; > my $enc_ascii = Encode::find_encoding('ascii'); > my $s = 'abc'; > printf("pre: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8'); > my $octets = $enc_ascii->encode($s); > printf("post: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8'); > > The result: > pre: abc - not utf8 > post: abc - UTF8 > > This side effect can cause surprises in an application which calls > the Encode::encode regardless of the UTF8 flag in a string, e.g. for > sanitation/logging purposes, where all of a sudden a plain bytes string > turns into an UTF8 string. > > Choices: > > - the Encode::encode should be fixed not to modify its arguments, > > - or, the effect should be clearly and boldly documented, > > - or, it should be documented that a not-UTF8 string should not > be given to Encode::encode (which is impractical, especially > in portable applications which cannot rely on Encode::is_utf8 > which had a bug in old versions, returning false for a tainted > but UTF8-flagged string - [perl #32687])
On Uto Máj 21 10:00:43 2013, Mark.Martinec@ijs.si wrote: Show quoted text
> The Encode::encode has a side effect of turning on the > UTF8 flag in its argument when given a plain non-UTF8 string. > > This is not new to 2.51, same happens with 2.42 and probably > most if not all older versions. > > The following program illustrates the issue: > > use Encode; > my $enc_ascii = Encode::find_encoding('ascii'); > my $s = 'abc'; > printf("pre: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8'); > my $octets = $enc_ascii->encode($s); > printf("post: %s - %s\n", $s, Encode::is_utf8($s)?'UTF8':'not utf8'); > > The result: > pre: abc - not utf8 > post: abc - UTF8
Fixed in 2.87: pre: abc - not utf8 post: abc - not utf8