Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the DateTime CPAN distribution.

Report information
The Basics
Id: 41629
Status: resolved
Priority: 0/
Queue: DateTime

People
Owner: Nobody in particular
Requestors: tlhackque [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: DateTime performance
I'm using DateTime in a log file analyzer that consolidates log files across multiple time zones. Performance was awful, and surprisingly turns out to be DateTime - sorta. I know people have complained about DateTime performance before - but before you tune out, I have some analysis that points to some straightforward fixes. Context: For a small test case, I get about 12 seconds of elapsed time on an idle system. If I simply read the file and count records, I spend about 2.5 seconds. I call DateTime->new once per record with year, second, minute, hour, day, month, time_zone [pre calculated from DateTime::TimeZone->new] and locale [also pre-calculated from DateTime::Locale->load]. I then do a single compare with a "recent" filter. Profile: dprofpp says that about 50 percent of the elapsed time is in DateTime::new! Finding # 1: Params::Validate is the biggest offender. If I set Params::Validate::NO_VALIDATION=1 in the caller (outside the main loop, of course), elapsed time in DateTime is cut in half. Elapsed time drops to 9 secs - a 25% improvment. Finding # 2: With NO_VALIDATION set, Params::Validate::_validate is still the dominant consumer of time! 29%. DateTime::new uses 18%. Finding # 3: Since the majority of the records are rejected, I replaced the DateTime compare with sprintf and a string compare. I also moved the DateTime::new so it's only called on the selected records (to convert to a display timezone.) This reduces runtime to 3 seconds, for a total 75% reduction in elapsed time. And confirms the bottlenecks. Excluding the time to read the file, processing is now essentially zero. Request # 1: It would be useful if there were a DateTime "no-validate" function or global. Hitting Params::Validate externally (or even knowing to) breaks modularity, and may cause effects elsewhere. If you do nothing else, this is a big win. Please document the call prominently. Request # 2: Consider replacing Params::Validate with something more efficient -- or have a production flag that selects Params::Validate for development or a lightweight parameter fetcher/checker for production. This is a bit more work, but it's still a big win! Here is the dprofpp profile for the case with Params::Validate::NO_VALIDATION=1: Total Elapsed Time = 8.912796 Seconds User+System Time = 8.512796 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 29.3 2.498 2.507 4817 0.0005 0.0005 Params::Validate::_validate 18.0 1.536 5.700 4349 0.0004 0.0013 DateTime::new 13.6 1.162 8.856 1 1.1620 8.8556 main::displayLog 6.47 0.551 0.605 4367 0.0001 0.0001 DateTime::_calc_local_components 6.21 0.529 0.729 17380 0.0000 0.0000 DateTime::Helpers::can 6.07 0.517 2.043 24 0.0215 0.0851 main::BEGIN 6.01 0.512 1.775 4345 0.0001 0.0004 DateTime::_compare 4.24 0.361 0.526 4366 0.0001 0.0001 DateTime::_handle_offset_modifier 4.17 0.355 0.355 8691 0.0000 0.0000 DateTime::utc_rd_values 2.82 0.240 0.928 4367 0.0001 0.0002 DateTime::_calc_local_rd 2.37 0.202 0.196 4352 0.0000 0.0000 DateTime::_calc_utc_rd 2.23 0.190 0.869 7 0.0271 0.1241 DateTime::Locale::BEGIN 2.17 0.185 0.182 4349 0.0000 0.0000 DateTime::_month_length 1.63 0.139 0.139 17395 0.0000 0.0000 UNIVERSAL::can 1.59 0.135 0.135 8693 0.0000 0.0000 DateTime::time_zone I hope this analysis is useful and that it will enable you to make some improvements. Thanks! Oh, Fedora Core 4 (Linux) 2.6.17-1.2142_FC4 #1, perl 5.8.8, i686
On Thu Dec 11 09:48:40 2008, tlhackque wrote: Show quoted text
> I'm using DateTime in a log file analyzer that consolidates log files > across multiple time zones. Performance was awful, and surprisingly > turns out to be DateTime - sorta. > > I know people have complained about DateTime performance before - but > before you tune out, I have some analysis that points to some > straightforward fixes.
First of all, your proposed fixes aren't that straightfoward ("use something faster than Params::Validate" - it's in XS!) But firster, are you sure you have the XS version of Params::Validate (and DateTime) installed. To check PV, run this code: perl -MParams::Validate -MParams::ValidateXS -le 'print "OK"' If that doesn't print OK, then you're using the way slower pure Perl implementation.
From: tlhackque [...] yahoo.com
Thanks for the quick reply. $ perl -MParams::Validate -MParams::ValidateXS -le 'print "OK"' OK So yes, I'm running the XS version. OK, so "straightforward" is somewhat misleading. What I meant is that at least we know where the time is going; it's not fundamental algorithmic complexity of the conversions -- which seems to be the main explanation used in the past. It's an interface wrapper issue. Suggestion 1 really is straightforward and easy: add a package variable "DT_Dont_Validate" & a method to set it. If it's set, on function entry just do { local Params::Validate::NO_VALIDATION = DT_Dont_Validate; Params::Validate::validate( this argument list ); } That should be have a substantial ROI. On the second suggestion - perhaps Params::Validate::Dummy is lightweight enough to help. It allegedly just passes argument lists thru, though it may not be quite heavy enough - perhaps some ||= defaulting is required too. I don't know the internals of DateTime well enough to know how much would be involved in making suitable changes. But the performance data that I gathered does say that there's benefit there. I'm not against parameter validation - in fact, I use DateTime's to catch user input errors. It's wonderful. But not in an inner loop that's handling machine-generated data! There may of course be a middle performance ground, where argument checks are in-line (rather that Param::Validate's heavy table-driven but very general mechanism.) But that's obviously the most work. In any case, I appreciate your consideration.
On Fri Dec 12 19:51:49 2008, tlhackque wrote: Show quoted text
> Thanks for the quick reply. > > $ perl -MParams::Validate -MParams::ValidateXS -le 'print "OK"' > OK > So yes, I'm running the XS version. > > OK, so "straightforward" is somewhat misleading. What I meant is that > at least we know where the time is going; it's not fundamental > algorithmic complexity of the conversions -- which seems to be the main > explanation used in the past. It's an interface wrapper issue. > > Suggestion 1 really is straightforward and easy: add a package > variable "DT_Dont_Validate" & a method to set it. If it's set, on > function entry just do > > { > local Params::Validate::NO_VALIDATION = DT_Dont_Validate; > > Params::Validate::validate( this argument list ); > } > > That should be have a substantial ROI. > > On the second suggestion - perhaps Params::Validate::Dummy is > lightweight enough to help. It allegedly just passes argument lists > thru, though it may not be quite heavy enough - perhaps some ||= > defaulting is required too. > > I don't know the internals of DateTime well enough to know how much > would be involved in making suitable changes. But the performance data > that I gathered does say that there's benefit there. > > I'm not against parameter validation - in fact, I use DateTime's to > catch user input errors. It's wonderful. But not in an inner loop > that's handling machine-generated data! > > There may of course be a middle performance ground, where argument > checks are in-line (rather that Param::Validate's heavy table-driven > but very general mechanism.) But that's obviously the most work. > > In any case, I appreciate your consideration.
I'd be interested in seeing replacement validation code that did all the validation that PV is doing, but was faster. I don't know if that's possible. You suggest simply skipping validation, but in your case that wouldn't help, since you say that a majority of the data you pass is getting rejected. Also, that seriously skews the profile, since if the majority is getting rejected, then you'd expect that to be highest in the profile, since you never make it past Params::Validate into DateTime itself.