Bug #78091 for DBIx-Class-EncodedColumn: Unicode string causes error

Thu Jun 28 07:18:14 2012 gbjk [...] thermeon.com - Ticket created

CC:	mark [...] repixl.com
Subject:	Unicode string causes error

Hi there, If we use DBIx::Class with enable_utf8 then the strings we use in our RS will be perl internal strings. Bcrypt expects an octet sequence, though, and blows up on trying to encode the perl sequence with "input must contain only octets". I figure the solution is just to utf8::encode what's flagged as utf8. Patch below. Regards Gareth --- perl/lib/site_perl/5.14.1/DBIx/Class/EncodedColumn/Crypt/Eksblowfish/Bcr ypt.pm 2011-04-11 19:51:04.000000000 +0000 +++ lib/DBIx/Class/EncodedColumn/Crypt/Eksblowfish/Bcrypt.pm 2012-06- 28 11:09:51.000000000 +0000 @@ -24,6 +24,11 @@ my $encoder = sub { my ($plain_text, $settings_str) = @_; + if (utf8::is_utf8($plain_text)){ + # Bcrypt expects octets. This dbi is probably going to encode later + # so we'll have to do this now + utf8::encode($plain_text); + } unless ( $settings_str ) { my $salt = join('', map { chr(int(rand(256))) } 1 .. 16); $salt = Crypt::Eksblowfish::Bcrypt::en_base64( $salt );

Fri Jun 29 08:45:08 2012 wreis [...] cpan.org - Correspondence added

On Thu Jun 28 07:18:14 2012, gbjk@thermeoneurope.com wrote: Show quoted text

> Hi there, > [snip] > I figure the solution is just to utf8::encode what's flagged as utf8. > > Patch below. > [snip]

Hi, Thanks for the patch. Could you please provide an automated test case? Cheers,

Fri Jun 29 08:45:10 2012 The RT System itself - Status changed from 'new' to 'open'

Fri Jun 29 09:19:05 2012 gbjk [...] thermeon.com - Correspondence added

Show quoted text

> Hi, > > Thanks for the patch. Could you please provide an automated test case? > > Cheers,

I'd just tack it onto the end of t/bcrypt.t: # Test utf8 characters make it through Bcrypt okay. use utf8; # Source code *is* utf8 $row->bcrypt_1("官话"); $row->update; Though you might want to pretty that up because in a failing case, it'll explode on you. Maybe you want to catch explosions from Bcrypt better anyway, though...? HTH. Sorry I can't do more.

Mon Apr 29 10:46:04 2013 wreis [...] cpan.org - Correspondence added

Fixed at 0.00012. https://metacpan.org/release/WREIS/DBIx-Class-EncodedColumn-0.00012

Mon Apr 29 10:46:05 2013 wreis [...] cpan.org - Status changed from 'open' to 'resolved'

Thu Jun 06 17:44:37 2019 ether [...] cpan.org - Correspondence added

On 2012-06-28 04:18:14, GBJK wrote: Show quoted text

> I figure the solution is just to utf8::encode what's flagged as utf8.

Not quite. "utf8::is_utf8" doesn't do what you think it does. It does *not* tell you whether the characters are ascii or non-ascii, but merely report on the *internal only* utf8 flag which indicates whether characters have been encoded into bytes or not (that is, a "wide" character might be represented as a single integer of value higher than 0xFF, in which case the utf8 flag will be off, or it could be represented as multiple integers all under 0xFF, in which case is_utf8 will return true). It CANNOT be used to determine whether a string should be run through utf8::encode or not -- to do that will result in mojibaked characters. All you can do is clearly document whether the strings you receive will be run through utf8::encode and ::decode, or not -- that is, whether you expect *characters*, or *bytes*. Generally encoding/decoding is only done on the edges of an application, at the very boundary between physical representation and logical. It is reasonable to do the encoding right before passing to crypt() or encode_base64() (etc), because that's the interface that requires bytes.

Tue Jul 09 13:20:45 2019 ether [...] cpan.org - Correspondence added

https://metacpan.org/release/ETHER/DBIx-Class-PassphraseColumn-0.04-TRIAL contains a proper solution to this problem.