Subject: | DateTime performance |
I'm using DateTime in a log file analyzer that consolidates log files
across multiple time zones. Performance was awful, and surprisingly
turns out to be DateTime - sorta.
I know people have complained about DateTime performance before - but
before you tune out, I have some analysis that points to some
straightforward fixes.
Context:
For a small test case, I get about 12 seconds of elapsed time on an
idle system.
If I simply read the file and count records, I spend about 2.5 seconds.
I call DateTime->new once per record with year, second, minute, hour,
day, month, time_zone [pre calculated from DateTime::TimeZone->new] and
locale [also pre-calculated from DateTime::Locale->load]. I then do a
single compare with a "recent" filter.
Profile:
dprofpp says that about 50 percent of the elapsed time is in
DateTime::new!
Finding # 1:
Params::Validate is the biggest offender. If I set
Params::Validate::NO_VALIDATION=1 in the caller (outside the main loop,
of course), elapsed time in DateTime is cut in half. Elapsed time
drops to 9 secs - a 25% improvment.
Finding # 2:
With NO_VALIDATION set, Params::Validate::_validate is still the
dominant consumer of time! 29%. DateTime::new uses 18%.
Finding # 3:
Since the majority of the records are rejected, I replaced the
DateTime compare with sprintf and a string compare. I also moved the
DateTime::new so it's only called on the selected records (to convert
to a display timezone.)
This reduces runtime to 3 seconds, for a total 75% reduction in elapsed
time. And confirms the bottlenecks. Excluding the time to read the
file, processing is now essentially zero.
Request # 1:
It would be useful if there were a DateTime "no-validate" function
or global. Hitting Params::Validate externally (or even knowing to)
breaks modularity, and may cause effects elsewhere. If you do nothing
else, this is a big win. Please document the call prominently.
Request # 2:
Consider replacing Params::Validate with something more efficient --
or have a production flag that selects Params::Validate for development
or a lightweight parameter fetcher/checker for production.
This is a bit more work, but it's still a big win!
Here is the dprofpp profile for the case with
Params::Validate::NO_VALIDATION=1:
Total Elapsed Time = 8.912796 Seconds
User+System Time = 8.512796 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
29.3 2.498 2.507 4817 0.0005 0.0005 Params::Validate::_validate
18.0 1.536 5.700 4349 0.0004 0.0013 DateTime::new
13.6 1.162 8.856 1 1.1620 8.8556 main::displayLog
6.47 0.551 0.605 4367 0.0001 0.0001
DateTime::_calc_local_components
6.21 0.529 0.729 17380 0.0000 0.0000 DateTime::Helpers::can
6.07 0.517 2.043 24 0.0215 0.0851 main::BEGIN
6.01 0.512 1.775 4345 0.0001 0.0004 DateTime::_compare
4.24 0.361 0.526 4366 0.0001 0.0001
DateTime::_handle_offset_modifier
4.17 0.355 0.355 8691 0.0000 0.0000 DateTime::utc_rd_values
2.82 0.240 0.928 4367 0.0001 0.0002 DateTime::_calc_local_rd
2.37 0.202 0.196 4352 0.0000 0.0000 DateTime::_calc_utc_rd
2.23 0.190 0.869 7 0.0271 0.1241 DateTime::Locale::BEGIN
2.17 0.185 0.182 4349 0.0000 0.0000 DateTime::_month_length
1.63 0.139 0.139 17395 0.0000 0.0000 UNIVERSAL::can
1.59 0.135 0.135 8693 0.0000 0.0000 DateTime::time_zone
I hope this analysis is useful and that it will enable you to make some
improvements.
Thanks!
Oh, Fedora Core 4 (Linux) 2.6.17-1.2142_FC4 #1, perl 5.8.8, i686