Skip Menu |

This queue is for tickets about the Date-Manip CPAN distribution.

Report information
The Basics
Id: 61097
Status: resolved
Priority: 0/
Queue: Date-Manip

People
Owner: Nobody in particular
Requestors: rhesa [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 6.11
Fixed in: (no value)



Subject: regression in date/time parsing and possible DoS on 64bits platforms
We use Date::Manip to parse EXIF date formats. From the 50 million photos in our system, we found over 200 variants, and D::M used to be able to extract something meaningful from most of them, with the following snippet: my $exifdate = '2009:06:25 13:02:48-04:00'; $exifdate =~ s/\D//g; print ParseDate($exifdate); # 2010090523:13:10 The new version 6 no longer parses these, which is unfortunate, but I can certainly understand your desire for some sanity in your parsing code. The big strength of Date::Manip used to be that I could throw any user input at it, and most of the time I'd end up with a date/time the user intended. I accept that that also means it could get it horribly wrong sometimes... But what is more severe in the current behavior with this input on a 64bits perl: it uses up all available memory building the cache, and this can cause a denial of service. I looked briefly under the debugger, and it keeps on calling days_since_1BC() and adding entries to $self->{cache}. I assume this is because under 64bits, the string '200906251302480400' is a valid integer, and if you treat that as an epoch, it'll be quite far into the future. I apologise for not providing a patch, or a more detailed investigation of the underlying issue. Being under time pressure, I chose to bundle an older version of D::M with our app.
1) To start with, I have added the EXIF format, so Date::Manip will now parse the form: 2009:06:25 13:02:48-04:00 so there should be no need to strip out characters. I would be interested in seeing examples of the '200 variants' to see if there's any other formats I need to add. I will try to get a new release out next week, so perhaps the best solution to your problem would be to use the 6.12 release. 2) By stripping out the non-digit characters the way you do, you have introduced a bug into your program. It can't be interpreted correctly (and in fact, it never was even with an older version of Date::Manip). First, the date (spaces added for readability): 2009 06 25 13 02 48 04 00 is NOT the same as the date above because the timezone changed from -04:00 to +04:00, so this is a bug. Second, in the old version of Date::Manip (which had less strict parsing), you did match a format... but not the one you thought you were matching. You matched the format: YYYYMMDDHHMMSSssss so your timezone '-0400' was getting treated as fractional seconds and was ignored (fractional seconds are quietly ignored). So, although it parsed... it wasn't correct. Since Date::Manip 6.xx is now a bit more strict on parsing, and fractional seconds now have to be separated from the seconds using a decimal (.) or comma (,), it no longer matches that (incorrect) format. Now your string is getting treated as a delta (200906251302480400 seconds from now) which is definitely not what you want either. Since that is a valid delta, Date::Manip merrily tries to calculate it. For performance issues, Date::Manip caches information that it might be able to reuse. When the calculations are within a few years of each other, it doesn't have to cache a lot, and it gives tremendous performance gains. However, when you're calculating a date 6.3 billion years in the future, it means caching a LOT of information (as you saw). So, you need to feed Date::Manip (whether you use the old version or the new version) a valid format so that you will get what you want. I'd suggest changing the colons in the date (but NOT the time) to slashes, or something along those lines. Of course, since the next release will handle this format, you could just switch to that when I release it. 3) I'm looking at the possiblilty of filtering out parsing a delta so many years in the past/future since it can't possibly give valid results. It's not a trivial change, so I haven't done it yet, but I'll try to add it in a future release.