Skip Menu |

This queue is for tickets about the JSON-PP CPAN distribution.

Report information
The Basics
Id: 93450
Status: resolved
Priority: 0/
Queue: JSON-PP

People
Owner: Nobody in particular
Requestors: haarg [...] haarg.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.27300



Subject: Extremely slow on decoded strings under DEBUGGING perl builds
When parsing strings that have already been decoded, using JSON::PP->new->decode (not enabling the utf8 flag) is extremely slow under perls built with the -DDEBUGGING flag. The problem appears to be the speed of the substr calls. substr is very slow in debug builds on decoded strings. It's much faster to encode the string as UTF-8, then enable the ->utf8 flag on the JSON::PP object. Maybe it makes sense to handle decoded strings that way? This makes many toolchain operations take significantly longer. The toolchain code can be adjusted to avoid this issue, but it seems like it would be preferable if JSON::PP could handle this case better.

Just to give an idea of how slow substr is on UTF8
 

See attached test:

Nondebug perl: reasonable but non-show-stopping difference:

utf8    8655698/s     --   -22%
normal 11107797/s    28%     --

Debug perl: several dozen orders of magnitude difference:


            Rate    utf8  normal
utf8      1499/s      --   -100%
normal 6770830/s 451557%      --

 

 

Subject: utf8substr.pl
#!/usr/bin/env perl # FILENAME: utf8substr.pl # CREATED: 02/22/14 07:18:45 by Kent Fredric (kentnl) <kentfredric@gmail.com> # ABSTRACT: Show how slow substr is on utf8 strings use strict; use warnings; use Benchmark qw(:all :hireswallclock); my $normal = "Hello" x 32768; use Encode; my $utf8 = Encode::decode('UTF-8',$normal); my $c; cmpthese(-1, { normal => sub { $c = substr $normal, 10240, 1 }, utf8 => sub { $c = substr $utf8, 10240, 1 }, });

Though it seems in my synthetic test, you can get it back close to non -DDEBUGGING speed by injecting this code

${^UTF8CACHE} = 1;

Because under -DDEBUGGING, the UTF8 cache is disabled (-1) for some reason.

This still means decoded strings will process slower than encoded strings though, but the difference is tolerable.

Though I don't know how acceptable tweaking that value is on a CPAN module.

I see only one thing on CPAN setting it, B::C , and its doing so to make it a negative number.

 

Ha, if you want a real-world test without needing a -DDEBUGING perl, try this:

cpanm --look Dist::Zilla::PluginBundle::Author::KENTNL

time PERL5OPT='-M1;${^UTF8CACHE}=-1' perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Dist::Zilla::PluginBundle::Author::KENTNL
Writing MYMETA.yml and MYMETA.json

real    0m10.905s
user    0m10.655s
sys    0m0.220s

rm Makefile MYMETA.*

time PERL5OPT='-M1;${^UTF8CACHE}=1' perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Dist::Zilla::PluginBundle::Author::KENTNL
Writing MYMETA.yml and MYMETA.json

real    0m1.458s
user    0m1.243s
sys    0m0.090s

Ballparks of 10 seconds for each dist installed, nice. Thats a whole minute just doing 'perl Makefile.PL' with 6 dists.

Ouch.

This should be fixed now, as processing is always done on bytes rather than characters.