Skip Menu |

This queue is for tickets about the Digest-SHA CPAN distribution.

Report information
The Basics
Id: 82378
Status: rejected
Worked: 15 min
Priority: 0/
Queue: Digest-SHA

People
Owner: Nobody in particular
Requestors: victor [...] vsespb.ru
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 5.74
Fixed in: (no value)



Subject: UTF-8 behaviour not documented
Dies with error "Wide character in subroutine entry" in newer versions. Works fine in older. It's not documented (except some unclear (for end-user) record in ChangeLog). I think this should be documented.
On Thu Jan 03 08:32:19 2013, vsespb wrote: Show quoted text
> Dies with error "Wide character in subroutine entry" in newer versions. > Works fine in older. > > It's not documented (except some unclear (for end-user) record in > ChangeLog). > > I think this should be documented.
Your report is incomplete since it contains no test case. Given the error message, it would appear as though you attempted to feed something like a Unicode character to Digest::SHA. Digest algorithms are defined to operate on sequences of bytes only, per specification. Since version 5.8, Perl accepts wide characters in strings. It makes no sense to pass such strings to a digest algorithm until they've been byte-encoded through something like UTF-8. This is common knowledge: it's not up to Digest::SHA or any other digest module to explain or document.
I agree that this is correct behaviour as SHA is defined only for Byte Strings. However I think this should be documented, especially if this behaviour is different in different versions (i.e. I can tell that older version contain bug - no error message in this case) test case: #!/usr/bin/perl use utf8; use Encode; use Digest::SHA qw/sha256_hex/; print sha256_hex(encode_utf8('тест')); print "\n"; print sha256_hex('тест'); print "\n"; old version prints two same SHA values. new version prints one (same) SHA and dies with warning. So without documentation bug in user program will remain unnoticed and will be the cause of program crash when it's used with new version of Digest::SHA (i.e. incompatibility between different versions of Digest::SHA). (and sometimes crash is worse than a such bug in user program - such bug does not affect anything as SHA is anyway same at least in my case) btw. Digest::SHA::PurePerl does not die with that error. On Fri Jan 04 03:35:38 2013, MSHELOR wrote: Show quoted text
> On Thu Jan 03 08:32:19 2013, vsespb wrote:
> > Dies with error "Wide character in subroutine entry" in newer versions. > > Works fine in older. > > > > It's not documented (except some unclear (for end-user) record in > > ChangeLog). > > > > I think this should be documented.
> > > Your report is incomplete since it contains no test case. > > Given the error message, it would appear as though you attempted to feed > something like a Unicode character to Digest::SHA. Digest algorithms > are defined to operate on sequences of bytes only, per specification. > > Since version 5.8, Perl accepts wide characters in strings. It makes no > sense to pass such strings to a digest algorithm until they've been > byte-encoded through something like UTF-8. This is common knowledge: > it's not up to Digest::SHA or any other digest module to explain or > document. > >
On Thu Jan 03 19:04:24 2013, vsespb wrote: Show quoted text
> I agree that this is correct behaviour as SHA is defined only for Byte > Strings. > > However I think this should be documented, especially if this behaviour > is different in different versions (i.e. I can tell that older version > contain bug - no error message in this case) > > test case: > > #!/usr/bin/perl > use utf8; > use Encode; > use Digest::SHA qw/sha256_hex/; > print sha256_hex(encode_utf8('тест')); > print "\n"; > print sha256_hex('тест'); > print "\n"; > > > old version prints two same SHA values. > new version prints one (same) SHA and dies with warning.
Thanks for supplying a test case. But note that the statement print sha256_hex('тест'); has no meaning since SHA and other digest algorithms operate on sequences of bytes, not on Unicode and other wide character data. In general when invalid data is fed to a program, the output is undefined: garbage in, garbage out. The fact that the output from invalid data changes from version to version is of no consequence. But I do understand your frustration, and therefore may decide to very briefly warn against wide characters in future versions of the documentation. If so I'll give you due acknowledgement. However, such practice is usually not recommended since it clutters the documentation with extraneous material, distracts the reader, and makes it more difficult to find information directly pertinent to the module.