Subject: | /^$RE{URI}$/ takes exponential time depending on the position of an invalid percent-encoding |
Date: | Thu, 28 May 2020 21:01:29 +0200 |
To: | bug-Regexp-Common [...] rt.cpan.org |
From: | Ghost Head <bigghosthead [...] gmail.com> |
Hello,
Matching /^$RE{URI}$/ against a URL with an invalid percent-encoding takes
exponential time. The further down the string the incriminating percent
characters is, the longer it takes. See the demo program below.
Anchoring at the beginning and the end of the string is important.
Using /^$RE{URI}{HTTP}$/ works fine, though.
This is on perl 5.28, Regexp::Common 2017060201, macOS 10.15.4. But the
same behaviour occurred on Linux with a different perl version.
Thank you,
Marcel
Demo:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Regexp::Common qw(URI);
use Time::HiRes qw(gettimeofday tv_interval);
# constant time
say "RE{URI}{HTTP}\n";
run(qr/$RE{URI}{HTTP}/);
# exponential time (sometimes)
say "\nRE{URI}\n";
run(qr/$RE{URI}/);
sub run {
my $re = shift;
for my $pos (24 .. 49) {
my $uri = 'http://www.example.com/zzzzzzzzzzzzzzzzzzzzzzzzzz';
substr($uri, $pos, 0) = '%';
my $t0 = [gettimeofday];
$uri =~ /^$re$/;
my $elapsed = tv_interval($t0);
say "$uri $elapsed";
}
}