Subject: | Side effects out of scope |
As of encoding::source v0.02, it still has side effect of propagating
$^ENCODING from compile-time to run-time (and out of its scope).
This may be serious if byte strings which contain 8th bit are
incorporated as literals (i.e. quotes) after "use encoding::source".
Perl interpreter feeds them trough "$^ENCODING->cat_decode" when compiling.
Solution is to change:
else {
$LATIN1->$method(@_);
}
to something else. Options include returning un"decode"d data in
cat_decode for out-of-scope case OR adding some "DESTROY()-itself"
object as $^H{__PACKAGE__} value (since the module already requires the
very fresh perl and %^H is working OK).
This bug report corresponds to my other message concerning
encoding::warnings which has similar issue:
http://rt.cpan.org/Ticket/Display.html?id=33989
http://www.cpanforum.com/posts/7304
Attached file is a simple test script which shows the presence of this
side effect.
If the maintainer is interested, I can prepare a patch for this issue
since I have some ideas on it.
INFO:
$VERSION = 0.02;
bash-3.2$ perl -v
This is perl, v5.10.0 built for cygwin-thread-multi-64int
(with 3 registered patches, see perl -V for more detail)
bash-3.2$ uname -a
CYGWIN_NT-5.0 perldev1 1.5.25(0.156/4/2) 2007-12-14 19:21 i686 Cygwin
Subject: | scope_leak.t |
#!/usr/bin/perl
use Test::Simple tests => 3;
my $byte_string_8bit = 'ÀÁÂÃÄÅ'; # A byte string
my $unicode_string = do {
use encoding::source 'cp1251';
'ÀÁÂÃÄÅ'; # Must be converted to corresponding Unicode string at compilation stage
};
## Toggle comments to see how the source must be treated without side-effects
#my $unicode_string = do { require Encode and Encode::decode( 'cp1251', 'ÀÁÂÃÄÅ' ) };
my $another_byte_string_8bit = 'ÀÁÂÃÄÅ'; # Must be a byte string because it's out of scope of encoding::source
ok utf8::is_utf8( $byte_string_8bit ) == 0, 'Byte strings are byte strings';
ok utf8::is_utf8( $unicode_string ) == 1, 'Unicode strings are unicode strings';
ok utf8::is_utf8( $another_byte_string_8bit ) == 0, 'Another byte strings are STILL byte strings';