Subject: | Passwords containing multibyte characters explode |
Hi David,
We've found that when passwords contain characters that, in Perl's internal representation, require more than 8 bits to represent, the SHA1 calculation explodes. This is going to get complicated for a second, but the solution is simple enough. Here we go.
1. The PwnedPasswords endpoint requests the prefix of the SHA1 has of the UTF-8 encoded password. Source: https://haveibeenpwned.com/API/v2#PwnedPasswords.
2. Perl (5)'s internal format stores _characters_, each of which is 32 bits. Source: https://metacpan.org/pod/Encode#TERMINOLOGY. When working with data in Perl, it's recommended to convert incoming data to Perl's internal representation, work with it, and then convert outgoing data to whatever encoding is desired. (Source: https://metacpan.org/pod/perlunitut#I/O-flow-(the-actual-5-minute-tutorial)) We follow that guidance, so the data we work with in Perl is generally in Perl's internal representation.
3. In order to ensure we're sending UTF-8 encoded passwords to PwnedPasswords endpoint (note -- that's different than "utf8" or "UTF8", see https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8), we needed to encode the password from Perl's internal representation to UTF-8 using $octets = Encode::encode('UTF-8', $password), and provide $octets (instead of $password) to the `password` method of your module. For strings composed of ASCII characters, $password and $octets is indistinguishable, but for strings containing characters requiring more than 8 bits to represent, $password and $octets differ.
4. As a test case, consider using WebService::HIBP to retrieve the prevalence of a password like "ǯ".
So, two proposals for your module (one or the other, but not both):
1. Document that the `password` method requires the UTF-8 encoded version of the password, or
2. The password method receives the `$password` in Perl's internal format, computes the UTF-8 encoding of it using something like `$octets = Encode::encode('UTF-8', $password)`, and then passes $octets to the hashing function (instead of $password).