Bug #93450 for JSON-PP: Extremely slow on decoded strings under DEBUGGING perl builds

Sat Mar 01 07:57:48 2014 haarg [...] haarg.org - Ticket created

Subject:

Extremely slow on decoded strings under DEBUGGING perl builds

When parsing strings that have already been decoded, using JSON::PP->new->decode (not enabling the utf8 flag) is extremely slow under perls built with the -DDEBUGGING flag. The problem appears to be the speed of the substr calls. substr is very slow in debug builds on decoded strings. It's much faster to encode the string as UTF-8, then enable the ->utf8 flag on the JSON::PP object. Maybe it makes sense to handle decoded strings that way? This makes many toolchain operations take significantly longer. The toolchain code can be adjusted to avoid this issue, but it seems like it would be preferable if JSON::PP could handle this case better.

Sat Mar 01 08:08:19 2014 KENTNL [...] cpan.org - Correspondence added

Just to give an idea of how slow substr is on UTF8

See attached test:

Nondebug perl: reasonable but non-show-stopping difference:

utf8 8655698/s -- -22%
normal 11107797/s 28% --

Debug perl: several dozen orders of magnitude difference:

            Rate    utf8 normal
utf8      1499/s      --   -100%
normal 6770830/s 451557%      --

Subject:

utf8substr.pl

#!/usr/bin/env perl # FILENAME: utf8substr.pl # CREATED: 02/22/14 07:18:45 by Kent Fredric (kentnl) <kentfredric@gmail.com> # ABSTRACT: Show how slow substr is on utf8 strings use strict; use warnings; use Benchmark qw(:all :hireswallclock); my $normal = "Hello" x 32768; use Encode; my $utf8 = Encode::decode('UTF-8',$normal); my $c; cmpthese(-1, { normal => sub { $c = substr $normal, 10240, 1 }, utf8 => sub { $c = substr $utf8, 10240, 1 }, });

Sat Mar 01 08:08:19 2014 The RT System itself - Status changed from 'new' to 'open'

Sat Mar 01 09:57:27 2014 KENTNL [...] cpan.org - Correspondence added

Though it seems in my synthetic test, you can get it back close to non -DDEBUGGING speed by injecting this code

${^UTF8CACHE} = 1;

Because under -DDEBUGGING, the UTF8 cache is disabled (-1) for some reason.

This still means decoded strings will process slower than encoded strings though, but the difference is tolerable.

Though I don't know how acceptable tweaking that value is on a CPAN module.

I see only one thing on CPAN setting it, B::C , and its doing so to make it a negative number.

Sat Mar 01 10:11:15 2014 KENTNL [...] cpan.org - Correspondence added

Ha, if you want a real-world test without needing a -DDEBUGING perl, try this:

cpanm --look Dist::Zilla::PluginBundle::Author::KENTNL

time PERL5OPT='-M1;${^UTF8CACHE}=-1' perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Dist::Zilla::PluginBundle::Author::KENTNL
Writing MYMETA.yml and MYMETA.json

real   0m10.905s
user   0m10.655s
sys   0m0.220s

rm Makefile MYMETA.*

time PERL5OPT='-M1;${^UTF8CACHE}=1' perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Dist::Zilla::PluginBundle::Author::KENTNL
Writing MYMETA.yml and MYMETA.json

real   0m1.458s
user   0m1.243s
sys   0m0.090s

Ballparks of 10 seconds for each dist installed, nice. Thats a whole minute just doing 'perl Makefile.PL' with 6 dists.

Ouch.

Tue Jun 14 18:18:52 2016 haarg [...] haarg.org - Correspondence added

This should be fixed now, as processing is always done on bytes rather than characters.

Tue Jun 14 18:18:53 2016 haarg [...] haarg.org - Status changed from 'open' to 'resolved'

Tue Jun 14 18:18:53 2016 haarg [...] haarg.org - Fixed in 2.27300 added