On Tue Jan 29 09:38:33 2013, jabbas wrote:
Show quoted text> When both, use utf8 and use Digest::MD5 are in effect md5_hex() is
> dying:
>
> jabbas@zedd ~ $ pmvers Digest::MD5
> 2.52
> jabbas@zedd ~ $ perl -MDigest::MD5=md5_hex -E 'say md5_hex(q/utf8
> string with
> diacritic chars ółść/)'
> 70894eb930a1bfea7bfb2dff75d2b276
> jabbas@zedd ~ $ perl -Mutf8 -MDigest::MD5=md5_hex -E 'say
> md5_hex(q/utf8 string
> with diacritic chars ółść/)'
> Wide character in subroutine entry at -e line 1.
> jabbas@zedd ~ $
>
> Breaks on: Centos 6, Gentoo, Mac OS X 10.7 (with perl 5.10, 5.12 and
> 5.14)
That's because with the utf8 pragma, you're no longer sending bytes to the function -- which you must do. From the documentation:
Since the MD5 algorithm is only defined for strings of bytes, it can not be used on
strings that contains chars with ordinal number above 255. The MD5 functions and methods will croak if you try to feed them such
input data:
use Digest::MD5 qw(md5_hex);
my $str = "abc\x{300}";
print md5_hex($str), "\n"; # croaks
# Wide character in subroutine entry
What you can do is calculate the MD5 checksum of the UTF-8 representation of such strings. This is achieved by filtering the string
through encode_utf8() function:
use Digest::MD5 qw(md5_hex);
use Encode qw(encode_utf8);
my $str = "abc\x{300}";
print md5_hex(encode_utf8($str)), "\n";
# 8c2d46911f3f5a326455f0ed7a8ed3b3
If you provide input properly, you'll get the appropriate output:
$ perl -Mutf8 -MEncode=encode_utf8 -MDigest::MD5=md5_hex -E 'say md5_hex(encode_utf8(q/utf8 string with diacritic chars ółść/))'
70894eb930a1bfea7bfb2dff75d2b276
This is not a bug.