Skip Menu |

This queue is for tickets about the WebService-HIBP CPAN distribution.

Report information
The Basics
Id: 128991
Status: resolved
Priority: 0/
Queue: WebService-HIBP

People
Owner: Nobody in particular
Requestors: PCRONIN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 0.12



Subject: Passwords containing multibyte characters explode
Hi David, We've found that when passwords contain characters that, in Perl's internal representation, require more than 8 bits to represent, the SHA1 calculation explodes. This is going to get complicated for a second, but the solution is simple enough. Here we go. 1. The PwnedPasswords endpoint requests the prefix of the SHA1 has of the UTF-8 encoded password. Source: https://haveibeenpwned.com/API/v2#PwnedPasswords. 2. Perl (5)'s internal format stores _characters_, each of which is 32 bits. Source: https://metacpan.org/pod/Encode#TERMINOLOGY. When working with data in Perl, it's recommended to convert incoming data to Perl's internal representation, work with it, and then convert outgoing data to whatever encoding is desired. (Source: https://metacpan.org/pod/perlunitut#I/O-flow-(the-actual-5-minute-tutorial)) We follow that guidance, so the data we work with in Perl is generally in Perl's internal representation. 3. In order to ensure we're sending UTF-8 encoded passwords to PwnedPasswords endpoint (note -- that's different than "utf8" or "UTF8", see https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8), we needed to encode the password from Perl's internal representation to UTF-8 using $octets = Encode::encode('UTF-8', $password), and provide $octets (instead of $password) to the `password` method of your module. For strings composed of ASCII characters, $password and $octets is indistinguishable, but for strings containing characters requiring more than 8 bits to represent, $password and $octets differ. 4. As a test case, consider using WebService::HIBP to retrieve the prevalence of a password like "ǯ". So, two proposals for your module (one or the other, but not both): 1. Document that the `password` method requires the UTF-8 encoded version of the password, or 2. The password method receives the `$password` in Perl's internal format, computes the UTF-8 encoding of it using something like `$octets = Encode::encode('UTF-8', $password)`, and then passes $octets to the hashing function (instead of $password).
On Tue Apr 02 01:03:29 2019, PCRONIN wrote: Show quoted text
> Hi David, > > We've found that when passwords contain characters that, in Perl's > internal representation, require more than 8 bits to represent, the > SHA1 calculation explodes. This is going to get complicated for a > second, but the solution is simple enough. Here we go. > > 1. The PwnedPasswords endpoint requests the prefix of the SHA1 has of > the UTF-8 encoded password. Source: > https://haveibeenpwned.com/API/v2#PwnedPasswords. > 2. Perl (5)'s internal format stores _characters_, each of which is 32 > bits. Source: https://metacpan.org/pod/Encode#TERMINOLOGY. When > working with data in Perl, it's recommended to convert incoming data > to Perl's internal representation, work with it, and then convert > outgoing data to whatever encoding is desired. (Source: > https://metacpan.org/pod/perlunitut#I/O-flow-(the-actual-5-minute- > tutorial)) We follow that guidance, so the data we work with in Perl > is generally in Perl's internal representation. > 3. In order to ensure we're sending UTF-8 encoded passwords to > PwnedPasswords endpoint (note -- that's different than "utf8" or > "UTF8", see https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8), > we needed to encode the password from Perl's internal representation > to UTF-8 using $octets = Encode::encode('UTF-8', $password), and > provide $octets (instead of $password) to the `password` method of > your module. For strings composed of ASCII characters, $password and > $octets is indistinguishable, but for strings containing characters > requiring more than 8 bits to represent, $password and $octets differ. > 4. As a test case, consider using WebService::HIBP to retrieve the > prevalence of a password like "ǯ". > > So, two proposals for your module (one or the other, but not both): > 1. Document that the `password` method requires the UTF-8 encoded > version of the password, or > 2. The password method receives the `$password` in Perl's internal > format, computes the UTF-8 encoding of it using something like > `$octets = Encode::encode('UTF-8', $password)`, and then passes > $octets to the hashing function (instead of $password).
That's embarrassing. Any preferences on normalising the password before sending it?
Assuming you don't need/want normalisation (can't see any reference to it in the API docs), does the following patch meet your needs?
Subject: hibp_utf8.patch
diff -Naur old/lib/WebService/HIBP.pm new/lib/WebService/HIBP.pm --- old/lib/WebService/HIBP.pm 2019-03-22 18:56:58.000000000 +1100 +++ new/lib/WebService/HIBP.pm 2019-04-02 19:26:27.765007139 +1100 @@ -6,10 +6,11 @@ use URI::Escape(); use LWP::UserAgent(); use Digest::SHA(); +use Encode(); use WebService::HIBP::Breach(); use WebService::HIBP::Paste(); -our $VERSION = '0.11'; +our $VERSION = '0.12'; sub _LENGTH_OF_PASSWORD_PREFIX { return 5; } @@ -155,7 +156,7 @@ sub password { my ( $self, $password ) = @_; - my $sha1 = uc Digest::SHA::sha1_hex($password); + my $sha1 = uc Digest::SHA::sha1_hex(Encode::encode('UTF-8', $password, 1)); my $url = $self->{password_url} . substr $sha1, 0, _LENGTH_OF_PASSWORD_PREFIX(); my $response = $self->_get($url); diff -Naur old/Makefile.PL new/Makefile.PL --- old/Makefile.PL 2019-03-22 18:49:35.000000000 +1100 +++ new/Makefile.PL 2019-04-02 19:22:18.094137336 +1100 @@ -21,6 +21,7 @@ 'strict' => 0, 'warnings' => 0, 'Digest::SHA' => 0, + 'Encode' => 0, 'URI::Escape' => 3.30, 'URI' => 1.53, 'JSON' => 0, diff -Naur old/t/01-hibp.t new/t/01-hibp.t --- old/t/01-hibp.t 2019-03-21 05:54:20.000000000 +1100 +++ new/t/01-hibp.t 2019-04-02 19:21:32.419612596 +1100 @@ -366,6 +366,12 @@ $count = $hibp->password($good_password); ok( $count == 0, "Good password '$good_password' returns a count of $count" ); + my $utf8_password = (chr 8364) . ' is a UTF-8 test ' . (chr 29399) . $good_password; + $count = $hibp->password($utf8_password); + my $encoded_password = Encode::encode('UTF-8', $utf8_password, 1); + ok( $count == 0, + "UTF-8 password '$encoded_password' returns a count of $count" ); + ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/40BCD', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); $ENV{HTTPS_PROXY} = 'http://incorrect.example.com'; $hibp = WebService::HIBP->new(); eval {
On Tue Apr 02 19:32:19 2019, DDICK wrote: Show quoted text
> Assuming you don't need/want normalisation (can't see any reference to > it in the API docs), does the following patch meet your needs?
Disregard my last. I checked with the HIBP implementation. Unicode::Normalize::NFD appears to be the correct encoding. Does this patch look okay?
Subject: hibp_utf8.patch
diff -Naur old/lib/WebService/HIBP.pm new/lib/WebService/HIBP.pm --- old/lib/WebService/HIBP.pm 2019-04-02 19:29:56.807410095 +1100 +++ new/lib/WebService/HIBP.pm 2019-04-02 20:56:40.969894661 +1100 @@ -6,7 +6,9 @@ use URI::Escape(); use LWP::UserAgent(); use Digest::SHA(); +use Unicode::Normalize(); use Encode(); +use Unicode::Normalize(); use WebService::HIBP::Breach(); use WebService::HIBP::Paste(); @@ -156,7 +158,8 @@ sub password { my ( $self, $password ) = @_; - my $sha1 = uc Digest::SHA::sha1_hex(Encode::encode('UTF-8', $password, 1)); + my $normalised = Unicode::Normalize::NFD($password); + my $sha1 = uc Digest::SHA::sha1_hex(Encode::encode('UTF-8', $normalised, 1)); my $url = $self->{password_url} . substr $sha1, 0, _LENGTH_OF_PASSWORD_PREFIX(); my $response = $self->_get($url); diff -Naur old/Makefile.PL new/Makefile.PL --- old/Makefile.PL 2019-04-02 19:29:56.808410107 +1100 +++ new/Makefile.PL 2019-04-02 20:57:02.010127468 +1100 @@ -22,6 +22,7 @@ 'warnings' => 0, 'Digest::SHA' => 0, 'Encode' => 0, + 'Unicode::Normalize' => 0, 'URI::Escape' => 3.30, 'URI' => 1.53, 'JSON' => 0, diff -Naur old/t/01-hibp.t new/t/01-hibp.t --- old/t/01-hibp.t 2019-04-02 19:29:56.808410107 +1100 +++ new/t/01-hibp.t 2019-04-02 20:54:53.392704353 +1100 @@ -372,6 +372,18 @@ ok( $count == 0, "UTF-8 password '$encoded_password' returns a count of $count" ); ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/40BCD', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); + my $denormalised_password = chr(0x0041) . chr(0x0300); + $count = $hibp->password($denormalised_password); + $encoded_password = Encode::encode('UTF-8', $denormalised_password, 1); + ok( $count == 0, + "UTF-8 password '$encoded_password' returns a count of $count" ); + ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/0273B', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); + my $normalised_password = chr(0x00C0); + $count = $hibp->password($normalised_password); + $encoded_password = Encode::encode('UTF-8', $normalised_password, 1); + ok( $count == 0, + "UTF-8 password '$encoded_password' returns a count of $count" ); + ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/0273B', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); $ENV{HTTPS_PROXY} = 'http://incorrect.example.com'; $hibp = WebService::HIBP->new(); eval {
On Tue Apr 02 21:00:43 2019, DDICK wrote: Show quoted text
> On Tue Apr 02 19:32:19 2019, DDICK wrote:
> > Assuming you don't need/want normalisation (can't see any reference > > to > > it in the API docs), does the following patch meet your needs?
> > Disregard my last. I checked with the HIBP implementation. > Unicode::Normalize::NFD appears to be the correct encoding. > > Does this patch look okay?
*sigh*
Subject: hibp_utf8.patch
diff -Naur old/lib/WebService/HIBP.pm new/lib/WebService/HIBP.pm --- old/lib/WebService/HIBP.pm 2019-03-22 18:56:58.000000000 +1100 +++ new/lib/WebService/HIBP.pm 2019-04-02 20:56:40.969894661 +1100 @@ -6,10 +6,13 @@ use URI::Escape(); use LWP::UserAgent(); use Digest::SHA(); +use Unicode::Normalize(); +use Encode(); +use Unicode::Normalize(); use WebService::HIBP::Breach(); use WebService::HIBP::Paste(); -our $VERSION = '0.11'; +our $VERSION = '0.12'; sub _LENGTH_OF_PASSWORD_PREFIX { return 5; } @@ -155,7 +158,8 @@ sub password { my ( $self, $password ) = @_; - my $sha1 = uc Digest::SHA::sha1_hex($password); + my $normalised = Unicode::Normalize::NFD($password); + my $sha1 = uc Digest::SHA::sha1_hex(Encode::encode('UTF-8', $normalised, 1)); my $url = $self->{password_url} . substr $sha1, 0, _LENGTH_OF_PASSWORD_PREFIX(); my $response = $self->_get($url); diff -Naur old/Makefile.PL new/Makefile.PL --- old/Makefile.PL 2019-03-22 18:49:35.000000000 +1100 +++ new/Makefile.PL 2019-04-02 20:57:02.010127468 +1100 @@ -21,6 +21,8 @@ 'strict' => 0, 'warnings' => 0, 'Digest::SHA' => 0, + 'Encode' => 0, + 'Unicode::Normalize' => 0, 'URI::Escape' => 3.30, 'URI' => 1.53, 'JSON' => 0, diff -Naur old/t/01-hibp.t new/t/01-hibp.t --- old/t/01-hibp.t 2019-03-21 05:54:20.000000000 +1100 +++ new/t/01-hibp.t 2019-04-02 20:54:53.392704353 +1100 @@ -366,6 +366,24 @@ $count = $hibp->password($good_password); ok( $count == 0, "Good password '$good_password' returns a count of $count" ); + my $utf8_password = (chr 8364) . ' is a UTF-8 test ' . (chr 29399) . $good_password; + $count = $hibp->password($utf8_password); + my $encoded_password = Encode::encode('UTF-8', $utf8_password, 1); + ok( $count == 0, + "UTF-8 password '$encoded_password' returns a count of $count" ); + ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/40BCD', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); + my $denormalised_password = chr(0x0041) . chr(0x0300); + $count = $hibp->password($denormalised_password); + $encoded_password = Encode::encode('UTF-8', $denormalised_password, 1); + ok( $count == 0, + "UTF-8 password '$encoded_password' returns a count of $count" ); + ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/0273B', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); + my $normalised_password = chr(0x00C0); + $count = $hibp->password($normalised_password); + $encoded_password = Encode::encode('UTF-8', $normalised_password, 1); + ok( $count == 0, + "UTF-8 password '$encoded_password' returns a count of $count" ); + ok ($hibp->last_request()->uri() eq 'https://api.pwnedpasswords.com/range/0273B', "Correctly encoded the uri for a password check for '$encoded_password' into " . $hibp->last_request()->uri()); $ENV{HTTPS_PROXY} = 'http://incorrect.example.com'; $hibp = WebService::HIBP->new(); eval {
On Tue Apr 02 06:02:34 2019, DDICK wrote: Show quoted text
> On Tue Apr 02 21:00:43 2019, DDICK wrote:
> > On Tue Apr 02 19:32:19 2019, DDICK wrote:
> > > Assuming you don't need/want normalisation (can't see any reference > > > to > > > it in the API docs), does the following patch meet your needs?
> > > > Disregard my last. I checked with the HIBP implementation. > > Unicode::Normalize::NFD appears to be the correct encoding. > > > > Does this patch look okay?
> > *sigh*
I wasn't able to find HIBP's implementation to confirm that NFD is their selected normalized form. That aside, the patch looks good if the second `use Unicode::Normalize();` is removed. Thank you!
I used a character combination that produced a failure to sha to the digest used by the hibp frontend for three of the four possibilities. The only one that worked is now included in the test suite. Thanks for the bug report