Digest::SHA example does not apply here: it uses user-supplied data "one-way" only, to get a digest, and original data is never received back. In our case, the user sends data to the job queue and then gets it back. Workload and result are fine with Unicode -- they're properly serialized by Storable and stored as bytes in Redis; Storable takes care about Unicode etc. Metadata, however, is not serialized for performance and convenience reasons, and stored in Redis as-is. If there is Unicode in metadata, we have the following options:
1. Assume everything is Unicode, turn utf-8 encoding in Redis.pm settings and take a substantial performance hit; as we store the biggest parts of job data - workload and result - serialized already, encoding and decoding them again is not a good idea.
2. Assume that all metadata is Unicode, encode and decode it; this may lead to subtle errors if user provides metadata which is binary, not Unicode.
3. Detect Unicode metadata and store "utf-8" flag along with metadata on redis, to decode only utf-8 metadata when it is requested by user. This makes metadata management more complicated.
4. Assume that metadata is for application internal use, and that application must ensure that it does not contain Unicode; if Unicode is really needed, it should be either stored in workload or result, or the application must take care about encoding and decoding Unicode metadata before sending to the job queue. The job queue will throw an exception if Unicode metadata is encountered.
We choose (4) as it is consistent, does not degrade performance and does not cause subtle errors with damaged data.
Втр Авг 13 18:49:15 2013, vsespb писал:
Show quoted text> > In provided example any string produced with a string operand from an
> > UTF8 string will have UTF8 flag set on it, even if the resulting
> > string doesn't contain any UTF-8 specific characters.
>
> Yes, agree. That was the point of this example.
>
> > - forcefully downgrade string to ASCII (see perldoc utf8)
>
> Point was that Redis::JobQueue code should try to downgrade string
> (because programmer cannot really control if his ASCII string contain
> utf-8 bit or no - this is shown in example).
>
> That is why, btw, utf8::is_utf8() is advertised as indication of some
> wrong workflow:
>
> perlunifaq:
>
> > Please, unless you're hacking the internals, or debugging weirdness,
> > don't think about the UTF8 flag at all.
> > That means that you very probably shouldn't use is_utf8 , _utf8_on or
> > _utf8_off at all.
>
> Also, in core module Digest::SHA::PurePerl you can see similar code
> which uses utf8::downgrade() (note that such use better be advertised
> in doc if input parameters altered and downgraded)
>
>
> On Wed Aug 14 02:37:59 2013, SGLADKOV wrote:
> > It's not an issue with UTF detection in Redis::JobQueue, but rather
> > implemented "by design". Plus the way how perl operates with UTF8
> > strings internally. In provided example any string produced with a
> > string operand from an UTF8 string will have UTF8 flag set on it,
> > even
> > if the resulting string doesn't contain any UTF-8 specific
> > characters.
> >
> > By design Redis::JobQueue uses freeze before storing job data on
> > Redis
> > (workload,result containers). This ensures that among other things,
> > UTF8-encoded strings are safe when passed this way. Though custom-
> > named fields are processed in any way and passed to Redis as-is. They
> > are designed as an easy and fast way for software developer to store
> > some internal / supplemental data among job details.
> >
> > As a workaround for such behavior you can do one of the following:
> > - forcefully downgrade string to ASCII (see perldoc utf8) before
> > attempting to pass it to Redis::JobQueue as a custom named field
> > - use freeze (Storable) before passing it to Redis
> > - store such string as part of worload / result data structures
> >
> >
> > Втр Авг 13 05:06:41 2013, vsespb писал:
> > > I see the following code here
> > >
https://metacpan.org/source/SGLADKOV/Redis-JobQueue-
> > > 1.03/lib/Redis/JobQueue.pm#L1000
> > >
> > > elsif ( $method eq 'HSET' and !$self->_redis->{encoding} and
> > > utf8::is_utf8( $_[2] ) )
> > > {
> > > # For non-serialized fields: UTF8 can not be transferred to
> > > the Redis server in mode of 'encoding => undef'
> > > confess $self->_error( E_MISMATCH_ARG )." (utf8 in $_[1])";
> > > }
> > >
> > > thing is plain ASCII-7bit data can contain utf-8 flag on.
> > >
> > > example:
> > >
> > > use strict;
> > > use warnings;
> > > use utf8;
> > >
> > > my $utfstr = "\x{442}\x{435}\x{441}\x{442}";
> > > my $s = "x $utfstr";
> > >
> > > my ($ascii_u, undef) = split (' ', $s);
> > >
> > > die "its not ascii" unless $ascii_u eq 'x';
> > > die "utf8 on" if utf8::is_utf8($ascii_u);
> > >
> > > __END__
> > >
> > > dies with "utf8 on" message