Subject: | encoding::warnings implicitly converts 8-bit literals |
When encoding::warnings is "use"d, the literals (quoted bytes)
containing 8-bit data are implicitly converted to Unicode using latin-1
converter (without the promised warning).
As I hacked into, it turned out to be that way, because import define
${^ENCODING} global variable with an object of its own class. Perl
interpreter then calls ${^ENCODING}->cat_decode( .. ) which converts all
source 8-bit literals to Unicode strings which can lead to miscellaneous
side effects.
The solution is to make "sub cat_decode" more complex (like "sub decode").
Attached file is a simple test script which shows the presence of side
effect.
If the maintainer is interested, I can prepare a patch for this issue
since I have some ideas on it.
INFO:
$encoding::warnings::VERSION = '0.11';
bash-3.2$ perl -v
This is perl, v5.10.0 built for cygwin-thread-multi-64int
(with 3 registered patches, see perl -V for more detail)
bash-3.2$ uname -a
CYGWIN_NT-5.0 perldev1 1.5.25(0.156/4/2) 2007-12-14 19:21 i686 Cygwin
Subject: | scope.t |
#!/usr/bin/perl
use Test::Simple tests => 3;
my $byte_string_8bit = 'ÀÁÂÃÄÅ'; # Bytes corresponding to first 6 letters of cyrillic ABC @cp1251
my $again_byte_string_8bit = do {
use encoding::warnings; # Comment this line to see how things must work
'ÀÁÂÃÄÅ'; # Again this string must be a byte string since we requested only warnings
};
my $another_byte_string_8bit = 'ÀÁÂÃÄÅ'; # Must be a byte string because it's out of scope of "use encoding::warnings" (it could even be in an unrelated place in other file)
ok utf8::is_utf8( $byte_string_8bit ) == 0, 'Byte strings are byte strings';
ok utf8::is_utf8( $again_byte_string_8bit ) == 0, 'Again byte strings are STILL byte strings';
ok utf8::is_utf8( $another_byte_string_8bit ) == 0, 'Another byte strings are STILL byte strings';