Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 13963
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: richmond [...] proofpoint.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.058
Fixed in: (no value)



Subject: Significant performance improvement for message parsing
I was able to extract a significant 15+% improvement in message parsing by a simple modification to Mail/Box/Parser/Perl.pm where it searches for boundary lines. The only change is to remove $file->getpos() out of the most common loop for MIME email around line 180 LINE: while(1) { my $line = $file->getline or last; foreach my $sep (@seps) { next if substr($line, 0, length $sep) ne $sep; next if $sep eq 'From ' && $line !~ m/ (?:19[789]\d|20[01]\d)/; $file->setpos($file->getpos - length $line); last LINE; } $line =~ s/\015$//; push @$lines, $line; } }
[guest - Mon Aug 1 15:30:59 2005]: Show quoted text
> I was able to extract a significant 15+% improvement in message > parsing by a simple modification to Mail/Box/Parser/Perl.pm where > it searches for boundary lines. The only change is to remove $file-
> >getpos() out of the most common loop for MIME email around line
> 180
In the general case, that is not acceptable: it's platform dependent. See the C standard about this (you shouldn't do math on the file pointer) Certainly with Perl::IO, where the input file can have all kinds of layers (like gzip), the getpos() may produce unexpected values. In the non-general case (simple native files under UNIX/Linux), your optimization is correct. For speed, install Mail::Box::Parser::C
I would love to use the C parser, but unfortunately it does not work with in-memory data (files only). Message parsing is currently the most expensive code my server performs by far (even more than virus scanning) followed by quoted-printable decoding Are you planning any modifications to the C parser in the future? Thanks, Todd [MARKOV - Mon Aug 15 08:48:06 2005]: Show quoted text
> In the general case, that is not acceptable: it's platform dependent. > See the C standard about this (you shouldn't do math on the file > pointer) > > Certainly with Perl::IO, where the input file can have all kinds of > layers (like gzip), the getpos() may produce unexpected values. > > In the non-general case (simple native files under UNIX/Linux), your > optimization is correct. For speed, install Mail::Box::Parser::C