Sorry for the delay, just getting back from a weekend trip. Attachment 3 replaces attachments 1 and 2, and fixes the additional issues noted by Sprout and discovered while investigating his suggestions. It creates a temporary, mortal, PV copy of any input SV to either encode or decode which is read-only, which has get or set magic, or which is not POK. Tests were added for the changed behavior, and no existing tests were impacted.
From 253098cb2d33d8b10fb377f776715e807c5cbb67 Mon Sep 17 00:00:00 2001
From: Dan Collins <dcollinsn@gmail.com>
Date: Wed, 8 Jun 2016 10:15:29 -0400
Subject: [PATCH] [cpan #115168] Correct magical and non-POK inputs
In [perl #128143], a user reports inconsistent behavior between the
following two ways to call decode:
$x =~ s/(.)/$latin1->decode($1)/ge
$x =~ s/(.)/Encode::decode('latin1', $1)/ge
The issue seems to be that Encode.xs Method_decode() performs some
operations on src without checking to see if it is readonly or
magical. Since $1 is magical, this results in incorrect data. From
stepping through with GDB, it looks like the corruption doesn't
happen until this line, early in encode_method():
U8 *s = (U8 *) SvPV(src, slen);
For inputs that are readonly, magical, or not POK, we first create
a mortal SV, then use sv_copypv to stringify and transfer.
Encode needs to do the same thing. Currently, encode will not
accept being passed, for example, a typeglob, but decode will.
This patch brings the functions in line, and will stringify any
input that sv_pvcopy is able to.
Tests are added for both functions for both magic and typeglob inputs.
---
Encode.xs | 14 +++++++++++++-
t/Encode.t | 10 +++++++++-
t/decode.t | 11 +++++++++--
3 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/Encode.xs b/Encode.xs
index cd7f7d1..b945eb4 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -647,8 +647,14 @@ CODE:
int check;
SV *fallback_cb = &PL_sv_undef;
encode_t *enc = INT2PTR(encode_t *, SvIV(SvRV(obj)));
+ if (SvREADONLY(src) || SvSMAGICAL(src) || SvGMAGICAL(src) || !SvPOK(src)) {
+ SV *tmp;
+ tmp = sv_newmortal();
+ sv_copypv(tmp, src);
+ src = tmp;
+ }
if (SvUTF8(src)) {
- sv_utf8_downgrade(src, FALSE);
+ sv_utf8_downgrade(src, FALSE);
}
if (SvROK(check_sv)){
fallback_cb = check_sv;
@@ -672,6 +678,12 @@ CODE:
int check;
SV *fallback_cb = &PL_sv_undef;
encode_t *enc = INT2PTR(encode_t *, SvIV(SvRV(obj)));
+ if (SvREADONLY(src) || SvSMAGICAL(src) || SvGMAGICAL(src) || !SvPOK(src)) {
+ SV *tmp;
+ tmp = sv_newmortal();
+ sv_copypv(tmp, src);
+ src = tmp;
+ }
sv_utf8_upgrade(src);
if (SvROK(check_sv)){
fallback_cb = check_sv;
diff --git a/t/Encode.t b/t/Encode.t
index d490255..55592f0 100644
--- a/t/Encode.t
+++ b/t/Encode.t
@@ -25,7 +25,7 @@ my @character_set = ('0'..'9', 'A'..'Z', 'a'..'z');
my @source = qw(ascii iso8859-1 cp1250);
my @destiny = qw(cp1047 cp37 posix-bc);
my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
-plan test => 38+$n*@encodings + 2*@source*@destiny*@character_set + 2*@ebcdic_sets*256 + 6 + 5;
+plan test => 38+$n*@encodings + 2*@source*@destiny*@character_set + 2*@ebcdic_sets*256 + 6 + 5 + 2;
my $str = join('',map(chr($_),0x20..0x7E));
my $cpy = $str;
ok(length($str),from_to($cpy,'iso8859-1','Unicode'),"Length Wrong");
@@ -164,3 +164,11 @@ $key = (keys %{{ "whatever" => '' }})[0];
$kopy = $key;
decode("UTF-16LE", $kopy, Encode::FB_CROAK);
ok $key, "whatever", 'decode with shared hash key scalars';
+
+my $latin1 = find_encoding('latin1');
+my $orig = "\316";
+$orig =~ /(.)/;
+ok $latin1->encode($1), $orig, '[cpan #115168] passing magic regex globals to encode';
+
+*a = $orig;
+ok $latin1->encode(*a), '*main::'.$orig, '[cpan #115168] passing typeglobs to encode';
diff --git a/t/decode.t b/t/decode.t
index 77cdaba..8f7b64e 100644
--- a/t/decode.t
+++ b/t/decode.t
@@ -2,8 +2,8 @@
# $Id: decode.t,v 1.1 2013/08/29 16:47:39 dankogai Exp $
#
use strict;
-use Encode qw(decode_utf8 FB_CROAK);
-use Test::More tests => 3;
+use Encode qw(decode_utf8 FB_CROAK find_encoding decode);
+use Test::More tests => 5;
sub croak_ok(&) {
my $code = shift;
@@ -23,3 +23,10 @@ croak_ok { Encode::decode('utf-8', $orig2, FB_CROAK) };
chop(my $new = $bytes . $pad);
croak_ok { Encode::decode_utf8($new, FB_CROAK) };
+my $latin1 = find_encoding('latin1');
+$orig = "\N{U+0080}";
+$orig =~ /(.)/;
+is($latin1->decode($1), $orig, '[cpan #115168] passing magic regex globals to decode');
+
+*a = $orig;
+is($latin1->decode(*a), '*main::'.$orig, '[cpan #115168] passing typeglobs to decode');
--
2.8.1