Skip Menu |

This queue is for tickets about the encoding-warnings CPAN distribution.

Report information
The Basics
Id: 33989
Status: open
Priority: 0/
Queue: encoding-warnings

People
Owner: Nobody in particular
Requestors: allter [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.11
Fixed in: (no value)



Subject: encoding::warnings implicitly converts 8-bit literals
When encoding::warnings is "use"d, the literals (quoted bytes) containing 8-bit data are implicitly converted to Unicode using latin-1 converter (without the promised warning). As I hacked into, it turned out to be that way, because import define ${^ENCODING} global variable with an object of its own class. Perl interpreter then calls ${^ENCODING}->cat_decode( .. ) which converts all source 8-bit literals to Unicode strings which can lead to miscellaneous side effects. The solution is to make "sub cat_decode" more complex (like "sub decode"). Attached file is a simple test script which shows the presence of side effect. If the maintainer is interested, I can prepare a patch for this issue since I have some ideas on it. INFO: $encoding::warnings::VERSION = '0.11'; bash-3.2$ perl -v This is perl, v5.10.0 built for cygwin-thread-multi-64int (with 3 registered patches, see perl -V for more detail) bash-3.2$ uname -a CYGWIN_NT-5.0 perldev1 1.5.25(0.156/4/2) 2007-12-14 19:21 i686 Cygwin
Subject: scope.t
#!/usr/bin/perl use Test::Simple tests => 3; my $byte_string_8bit = 'ÀÁÂÃÄÅ'; # Bytes corresponding to first 6 letters of cyrillic ABC @cp1251 my $again_byte_string_8bit = do { use encoding::warnings; # Comment this line to see how things must work 'ÀÁÂÃÄÅ'; # Again this string must be a byte string since we requested only warnings }; my $another_byte_string_8bit = 'ÀÁÂÃÄÅ'; # Must be a byte string because it's out of scope of "use encoding::warnings" (it could even be in an unrelated place in other file) ok utf8::is_utf8( $byte_string_8bit ) == 0, 'Byte strings are byte strings'; ok utf8::is_utf8( $again_byte_string_8bit ) == 0, 'Again byte strings are STILL byte strings'; ok utf8::is_utf8( $another_byte_string_8bit ) == 0, 'Another byte strings are STILL byte strings';
From: allter [...] gmail.com
As an addition, this issue is tightly related to the same issue in encoding::source that I found: http://rt.cpan.org/Public/Bug/Display.html?id=33990
Subject: Re: [rt.cpan.org #33989] encoding::warnings implicitly converts 8-bit literals
Date: Tue, 11 Mar 2008 22:18:31 +0800
To: bug-encoding-warnings [...] rt.cpan.org
From: Audrey Tang <audreyt [...] audreyt.org>
Andrey M. Smirnov via RT 提到: Show quoted text
> If the maintainer is interested, I can prepare a patch for this issue > since I have some ideas on it.
Sure! Please do. Audrey
From: allter [...] gmail.com
Tue. Mar. 11 09:37:02 2008, allter wrote: Show quoted text
> The solution is to make "sub cat_decode" more complex (like "sub decode").
[...] Show quoted text
> If the maintainer is interested, I can prepare a patch for this issue > since I have some ideas on it.
Unfortunately, perl is setting utf8 flag on strings received from cat_decode if ${^ENCODING} is set (i.e. at compile-time): toke.c, line 11858: if (has_utf8 || PL_encoding) SvUTF8_on(sv); So the complete solution is [probably?] impossible. I will think how is better to implement half-measures. Meanwhile any ideas are welcome...