Subject: | character-sets |
Hi,
Perl is very good in handling character-sets... if you follow one very
simple rule: on any place where characters enter of leave the program
you have to be very explicit. IMO, digesting strings is a kind of
output filter.
Digest ignores the character-set problem. For instance, if I read text
from a file which contains äø as valid latin1 and I digest that, I will
get a different digest from the same characters as autf8. The problem is
that my latin1 string might automatically be converted into utf8! The
programmer does not always know whether Perl converts the input data.
The solution would be to add an explicit character-set on new()
Digest->new('SHA-1', encoding => 'utf8')
to specify which charset the text must be in to be signed. When
specified, it should call Encode::encode. When not specified, it
should croak when the utf-8 flag is on: it should be interpreted as
raw bytes.
The work-around is
$digest->add(encode 'utf8', $text)
for every call to add. The "utf8" information is wrongly located,
because add() is not about output.
I hope you will consider this improvement.