Subject: | Strptime fails with whitespaces/dashes in month/day names |
Here is a script showing some failures related to
whitespace and dashes in month names and day names.
I have checked with DT::F::Strptime 1.04, DT::Locale 0.09
and Perl 5.8.0 for Linux.
The script below should write each string twice, it does not.
-----------
#!/home/p80/perl/bin/perl
use strict;
use lib ('.');
use DateTime;
#use DateTime::Format::Strptime;
use Strptime_fixed;
sub control {
my ($y, $m, $d, $h, $lo, $pat) = @_;
my ($p, $d1, $s2, $d3, $s4);
$d1 = DateTime->new(year => $y, month => $m, day => $d, hour => $h, locale => $lo);
$s2 = $d1->strftime($pat);
eval { $p = DateTime::Format::Strptime->new(
pattern => $pat,
locale => $lo,
on_error => 'croak')};
eval { $d3 = $p->parse_datetime($s2) } unless $@;
$s4 = $@ ? $@ : $d3->strftime($pat);
print "s2 $s2\ns4 $s4\n";
}
# Day name contains space --> fixed
control(2004, 6, 2, 0, 'ga', '%A %d %B %Y');
# Day name contains dash --> fixed
control(2004, 6, 2, 0, 'pt', '%A %d %B %Y');
# Month name contains space --> fixed
control(2004, 9, 2, 0, 'ga', '%Y-%m-%d %B aaa');
# AM marker does not contain 'a', PM marker contains 'a' --> fixed, maybe
control(2004, 6, 2, 2, 'de', '%A %d %B %Y %H:%M:%S -- %I %p');
control(2004, 6, 2, 14, 'de', '%A %d %B %Y %H:%M:%S -- %I %p');
# When the escape-char escapes itself --> not fixed
control(2004, 6, 2, 14, 'en', '%A %d %B %Y %%Y');
----
And I attach a patch, which fixes the whitespace and dash mistakes
but not the escaped percent problem.
The patch also corrects some documentation errors, but not the
main one: in the synopsis paragraph, you describe how to
override the default behavior "undef" with another behavior,
"undef". It would be more pedagogical to override the
default behavior with "croak".
Nevertheless, DT::F::Strptime is a good module.
Jean Forget
--- /home/p80/modules/DateTime-Format-Strptime-1.04/lib/DateTime/Format/Strptime.pm Sat Aug 9 16:16:35 2003
+++ /home/p80/bin/Strptime_fixed.pm Tue Aug 17 09:51:33 2004
@@ -257,6 +257,7 @@
# Variables for DateTime
my ( $Year, $Month, $Day,
$Hour, $Minute, $Second, $Nanosecond,
+ $Am, $Pm
) = ();
# Run the parser
@@ -462,12 +463,13 @@
$self->local_croak("$hour_24 is too large to be an hour of the day.") and return undef unless $hour_24 <= 23; #OK so leap seconds will break!
$self->local_croak("$hour_12 is too large to be an hour of the day.") and return undef unless $hour_12 <= 12;
$self->local_croak("You must specify am or pm for 12 hour clocks ($hour_12|$ampm).") and return undef if ($hour_12 && (! $ampm));
- if ($ampm=~/p/i) {
+ ($Am, $Pm) = @{$self->{_locale}->am_pms};
+ if (lc $ampm eq lc $Pm) {
if ($hour_12) {
$hour_12 += 12 if $hour_12 and $hour_12 != 12;
}
$self->local_croak("Your am/pm value ($ampm) does not match your hour ($hour_24)") and return undef if $hour_24 and $hour_24 < 12;
- } elsif ($ampm=~/a/i) {
+ } elsif (lc $ampm eq lc $Am) {
if ($hour_12) {
$hour_12 = 0 if $hour_12 == 12;
}
@@ -617,7 +619,7 @@
$field_list =~ s/%X/$default_time_format/eg;
# %x id the locale's default time format.
- # I'm absoutely certain there's a better way to do this:
+ # I'm absolutely certain there's a better way to do this:
$regex=~s|([\/\.\-])|\\$1|g;
$regex =~ s/%T/%H:%M:%S/g;
@@ -640,12 +642,16 @@
$field_list =~ s|%F|%Y%m%d|g;
#is the same as %Y-%m-%d - the ISO date format.
- $regex =~ s/%a/(\\w+)/gi;
+ my $day_re = join '|', map { quotemeta $_ } sort { length $b <=> length $a } grep /\W/,
+ @{$self->{_locale}->day_names}, @{$self->{_locale}->day_abbreviations};
+ $regex =~ s/%a/($day_re|\\w+)/gi;
$field_list =~ s/%a/#dow_name#/gi;
# %a is the day of the week, using the locale's weekday names; either the abbreviated or full name may be specified.
# %A is the same as %a.
- $regex =~ s/%[bBh]/([^\\s]+)/g;
+ my $month_re = join '|', map { quotemeta $_ } sort { length $b <=> length $a } grep /\s/,
+ @{$self->{_locale}->month_names}, @{$self->{_locale}->month_abbreviations};
+ $regex =~ s/%[bBh]/($month_re|[^\\s]+)/g;
$field_list =~ s/%[bBh]/#month_name#/g;
#is the month, using the locale's month names; either the abbreviated or full name may be specified.
# %B is the same as %b.
@@ -808,7 +814,7 @@
my $Strp = new DateTime::Format::Strptime(
pattern => '%T',
locale => 'en_AU',
- time_zone => 'Melbourne/Australia',
+ time_zone => 'Australia/Melbourne',
);
my $dt = $Strp->parse_datetime('23:16:42');
@@ -823,7 +829,7 @@
my $Strp = new DateTime::Format::Strptime(
pattern => '%T',
locale => 'en_AU',
- time_zone => 'Melbourne/Australia',
+ time_zone => 'Australia/Melbourne',
on_error => 'undef',
);
@@ -910,7 +916,7 @@
Given a C<DateTime> object, this methods returns a string formatted in
the object's format. This method is synonymous with C<DateTime>'s
-strptime method.
+strftime method.
=item * locale($locale)
=item * language($locale)
@@ -930,7 +936,7 @@
When given a pattern, this method sets the object's pattern. If the
pattern is invalid, the method will croak or return undef (depending on
-the value of $DateTime::Format::Strptime::CROAK)
+the value of the C<on_error> parameter)
If successful this method returns the current pattern. (After processing
as above)
@@ -942,9 +948,9 @@
returned by parse_datetime
If the time zone is invalid, the method will croak or return undef
-(depending on the value of $DateTime::Format::Strptime::CROAK)
+(depending on the value of the C<on_error> parameter)
-If successful this method returns the current pattern. (After processing
+If successful this method returns the current time zone. (After processing
as above)
=item * errmsg
@@ -952,7 +958,8 @@
If the on_error behavior of the object is 'undef', error messages with
this method so you can work out why things went wrong.
-This code emulates $DateTime::Format::Strptime::CROAK being true:
+This code emulates a C<$DateTime::Format::Strptime> with
+the C<on_error> parameter equal to C<'croak'>:
C<$Strp->pattern($pattern) or die $DateTime::Format::Strptime::errmsg>