Bug #80306 for Digest: character-sets

Subject:

character-sets

Hi, Perl is very good in handling character-sets... if you follow one very simple rule: on any place where characters enter of leave the program you have to be very explicit. IMO, digesting strings is a kind of output filter. Digest ignores the character-set problem. For instance, if I read text from a file which contains äø as valid latin1 and I digest that, I will get a different digest from the same characters as autf8. The problem is that my latin1 string might automatically be converted into utf8! The programmer does not always know whether Perl converts the input data. The solution would be to add an explicit character-set on new() Digest->new('SHA-1', encoding => 'utf8') to specify which charset the text must be in to be signed. When specified, it should call Encode::encode. When not specified, it should croak when the utf-8 flag is on: it should be interpreted as raw bytes. The work-around is $digest->add(encode 'utf8', $text) for every call to add. The "utf8" information is wrongly located, because add() is not about output. I hope you will consider this improvement.