Subject: | Perf issue in relaxed.pm |
Date: | Tue, 15 Oct 2019 17:43:42 +0000 |
To: | "bug-Mail-DKIM [...] rt.cpan.org" <bug-Mail-DKIM [...] rt.cpan.org> |
From: | Todd Richmond <todd_richmond [...] hotmail.com> |
There is a severe regex perf hit when stripping whitespaces from email bodies when the msg has a huge # of spaces in it. We saw 10+ minutes of cpu burn for a sample real-world message. The fix is to simply swap the code that compresses whitespace to occur before stripping trailing chars. Also, it is more efficient to use my ($self, $var) = @_ instead of using shift() to peel off the $self var for any sub that is called often (i.e. once per line) and doesn’t need to pass the remaining args as array to a sub-function
*** lib/Mail/DKIM/Canonicalization/relaxed.pm.orig 2019-10-15 09:17:04.377450322 -0700
--- lib/Mail/DKIM/Canonicalization/relaxed.pm 2019-10-15 09:17:42.806800372 -0700
*************** sub canonicalize_header {
*** 62,82 ****
}
sub canonicalize_body {
! my $self = shift;
! my ($multiline) = @_;
$multiline =~ s/\015\012\z//s;
#
! # step 1: ignore all white space at the end of lines
#
! $multiline =~ s/[ \t]+(?=\015\012|\z)//g;
#
! # step 2: reduce all sequences of WSP within a line to a single
! # SP character
#
! $multiline =~ s/[ \t]+/ /g;
$multiline .= "\015\012";
--- 62,81 ----
}
sub canonicalize_body {
! my ($self, $multiline) = @_;
$multiline =~ s/\015\012\z//s;
#
! # step 1: reduce all sequences of WSP within a line to a single
! # SP character
#
! $multiline =~ s/[ \t]+/ /g;
#
! # step 2: ignore all white space at the end of lines
#
! $multiline =~ s/[ \t]+(?=\015\012|\z)//g;
$multiline .= "\015\012";