CC: | MARKOV Solutions <solutions [...] overmeer.net> |
Subject: | Digest::SHA / unicode: the use of SvPVbyte instead of SvPV, mangles the data of correct, UTF-8 enabled scalars |
Date: | Tue, 18 Feb 2014 17:47:33 +0100 |
To: | bug-Digest-SHA [...] rt.cpan.org |
From: | Achim Adam <achim.adam [...] univie.ac.at> |
hi,
we are verifying xmldsig signatures using Digest::SHA (5.86), and have noticed that
the unicode awareness that was added in 5.74 (i.e. the use of SvPVbyte instead of
SvPV in SHA.xs's add() or sha1() functions), leads to the mangling of the input
data for correct UTF-8-enabled scalars, and the subsequent generation of incorrect
digests.
consider for example a file named UTF8, with a content of 2 bytes: 0xC3 0xA9.
(this is the correct UTF-8 encoding for the unicode character u+00E9).
now consider the following test script, where the generated digests are compared
with the one generated by the `sha256sum' command.
Message body not shown because it is not plain text.
-------------------------------------------------------------------------------------
use strict;
use warnings;
use Digest::SHA;
use Devel::Peek;
my $separator = ('-' x 85)."\n";
my ($fh, $utf8, $sha256_sys, $sha256_perl);
print STDERR $separator;
($sha256_sys) = split /\s+/, `sha256sum UTF8`;
printf STDERR "%-20s % 50s\n", "sha256sum command:", $sha256_sys;
print STDERR $separator;
open $fh, 'UTF8';
$utf8 = <$fh>;
close $fh;
print STDERR "perl raw read, before SHA:\n";
Dump $utf8;
$sha256_perl = Digest::SHA->new(256)->add($utf8)->hexdigest;
printf STDERR "%-20s % 50s\n", "perl raw read:", $sha256_perl;
print STDERR "perl raw read, after SHA:\n";
Dump $utf8;
print STDERR $separator;
open $fh, '<:encoding(UTF-8)', 'UTF8';
$utf8 = <$fh>;
close $fh;
print STDERR "perl :utf8 read before SHA:\n";
Dump $utf8;
$sha256_perl = Digest::SHA->new(256)->add($utf8)->hexdigest;
printf STDERR "%-20s % 50s\n", "perl :utf8 read:", $sha256_perl;
print STDERR "perl :utf8 read after SHA:\n";
Dump $utf8;
print STDERR $separator;
-------------------------------------------------------------------------------------
the output is:
-------------------------------------------------------------------------------------
sha256sum command: 4a99557e4033c3539de2eb65472017cad5f9557f7a0625a09f1c3f6e2ba69c4c
-------------------------------------------------------------------------------------
perl raw read, before SHA:
SV = PV(0x9c130e8) at 0x9c24ec0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x9c92788 "\303\251"\0
CUR = 2
LEN = 80
perl raw read: 4a99557e4033c3539de2eb65472017cad5f9557f7a0625a09f1c3f6e2ba69c4c
perl raw read, after SHA:
SV = PV(0x9c130e8) at 0x9c24ec0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x9c92788 "\303\251"\0
CUR = 2
LEN = 80
-------------------------------------------------------------------------------------
perl :utf8 read before SHA:
SV = PV(0x9c130e8) at 0x9c24ec0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x9c92788 "\303\251"\0 [UTF8 "\x{e9}"]
CUR = 2
LEN = 80
perl :utf8 read: de2e331d891ae267a7009cb45b4e8830f170e0c937288ea2731a1941c7a53b0d
perl :utf8 read after SHA:
SV = PV(0x9c130e8) at 0x9c24ec0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x9c92788 "\351"\0
CUR = 1
LEN = 80
-------------------------------------------------------------------------------------
note that the following scalar, read from the file from an ':utf8'-enabled filehandle:
SV = PV(0x9c130e8) at 0x9c24ec0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x9c92788 "\303\251"\0 [UTF8 "\x{e9}"]
CUR = 2
LEN = 80
... is the correct, perl-internal representation of the input.
(this is also what for example XML::LibXML correctly yields, when an UTF-8 encoded
document is parsed.)
the generated digest however, is wrong -- and you can see why; sv_utf8_downgrade(),
probably called by SvPVbyte, has mangled the input:
SV = PV(0x9c130e8) at 0x9c24ec0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x9c92788 "\351"\0
CUR = 1
LEN = 80
i think replacing SvPV by SvPVbyte was probably a mistake; a digest module should
likely not have any unicode awareness, and in particular, should not modify its input.
it should be using SvPV.
regards,
Achim
PS: equally on:
Linux acdev 2.6.32-5-686 #1 SMP Wed Jan 11 12:29:30 UTC 2012 i686 GNU/Linux
Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
Platform:
osname=linux, osvers=2.6.32-5-686, archname=i686-linux
uname='linux acdev 2.6.32-5-686 #1 smp wed jan 11 12:29:30 utc 2012 i686 gnulinux '
config_args='-des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.4.5', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib /usr/lib64
libs=-lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.11.2.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.11.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'
Characteristics of this binary (from libperl):
Compile-time options: PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
USE_PERL_ATOF
Built under linux
Compiled at Jan 27 2012 16:45:11
@INC:
/usr/local/lib/perl5/site_perl/5.14.2/i686-linux
/usr/local/lib/perl5/site_perl/5.14.2
/usr/local/lib/perl5/5.14.2/i686-linux
/usr/local/lib/perl5/5.14.2
/usr/local/lib/perl5/site_perl
and:
Linux devbaer 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
Platform:
osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux perlbaer7 3.2.0-4-amd64 #1 smp debian 3.2.46-1 x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags=-Wl,-rpath=/opt/perl-5.18.2/lib/5.18.2/CORE -Wl,-z,relro -Dlddlflags=-shared -Wl,-rpath=/opt/perl-5.18.2/lib/5.18.2/CORE -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/opt/perl-5.18.2 -Dprivlib=/opt/perl-5.18.2/share/5.18.2 -Darchlib=/opt/perl-5.18.2/lib/5.18.2 -Dvendorprefix=/opt/perl-5.18.2 -Dvendorlib=/opt/perl-5.18.2/share/perl5 -Dvendorarch=/opt/perl-5.18.2/lib/perl5 -Dsiteprefix=/opt/perl-5.18.2 -Dsitelib=/opt/perl-5.18.2/share/5.18.2 -Dsitearch=/opt/perl-5.18.2/lib/5.18.2 -Dman1dir=/opt/perl-5.18.2/man/man1 -Dman3dir=/opt/perl-5.18.2/man/man3 -Dsiteman1dir=/opt/perl-5.18.2/man/man1 -Dsiteman3dir=/opt/perl-5.18.2/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.18.2 -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.7.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags ='-Wl,-rpath=/opt/perl-5.18.2/lib/5.18.2/CORE -Wl,-z,relro -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=, so=so, useshrplib=true, libperl=libperl.so.5.18.2
gnulibc_version='2.13'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/opt/perl-5.18.2/lib/5.18.2/CORE'
cccdlflags='-fPIC', lddlflags='-shared -Wl,-rpath=/opt/perl-5.18.2/lib/5.18.2/CORE -Wl,-z,relro -L/usr/local/lib -fstack-protector'
Characteristics of this binary (from libperl):
Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL
USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
USE_REENTRANT_API
Built under linux
Compiled at Jan 9 2014 12:54:46
@INC:
/opt/perl-5.18.2/lib/5.18.2
/opt/perl-5.18.2/share/5.18.2
/opt/perl-5.18.2/lib/perl5
/opt/perl-5.18.2/share/perl5
/opt/perl-5.18.2/lib/5.18.2
/opt/perl-5.18.2/share/5.18.2