Skip Menu |

This queue is for tickets about the Parse-Syslog-Mail CPAN distribution.

Report information
The Basics
Id: 12775
Status: resolved
Priority: 0/
Queue: Parse-Syslog-Mail

People
Owner: SAPER [...] cpan.org
Requestors: jthardy [...] uta.edu
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.03
Fixed in: 0.04



Subject: Odd number of elements in hashref because map and split bug
Parse-Syslog-Mail-0.03 v5.8.5 built for i386-linux-thread-multi RHEL 4 WS, Linux 2.6.9-5.0.5.EL, i686 MTAs sometimes return messages with commas in the text, (and sometimes results contain additional equal signs) Take a look at this common result: "stat=Deferred: 453 sorry, mailbox currently unavailable" Chopping up the map on commas then produces the elements: 'stat=Deferred: 453 sorry' and 'mailbox currently unavailable' When we then split on the equal sign, we get: { 'stat' => 'Deferred: 453 sorry' 'mailbox currently unavailable' => undef } This patch finds commas with a word immediately followed by and equal sign and chops the map up before that word by replacing the comma with a tab character. The map is then produced using \t instead of ', '. The patch also limits the amount of splits that happen within each mapped element, in the case of a result string also containing an equal sign, to ensure equal hash elements.
*** /tmp/Mail.pm 2005-05-11 17:11:26.000000000 -0500 --- Mail.pm 2005-05-12 10:07:44.571806522 -0500 *************** *** 177,187 **** $log->{text} =~ s/^\s*([^=]+)\s*$/status=$1/; ! my @fields = split ', ', $log->{text}; %mail = map { s/,$//; s/^ +//; s/ +$//; # cleaning spaces s/^stat=/status=/; # renaming 'stat' field to 'status' ! split /=/ } @fields; $mail{id} = $id; $mail{timestamp} = $log->{timestamp}; } --- 177,193 ---- $log->{text} =~ s/^\s*([^=]+)\s*$/status=$1/; ! $log->{text} =~ s/collect: /collect=/; ! $log->{text} =~ s/([^\s]+),\s+([^\s]+)=/$1\t$2=/g; ! ! my @fields = split '\t', $log->{text}; %mail = map { s/,$//; s/^ +//; s/ +$//; # cleaning spaces s/^stat=/status=/; # renaming 'stat' field to 'status' ! split(/=/,$_,2); } @fields; $mail{id} = $id; + $mail{host} = $log->{host}; + $mail{program} = $log->{program}; + $mail{text} = $log->{text}; $mail{timestamp} = $log->{timestamp}; }
Date: Fri, 13 May 2005 02:16:41 +0200
Subject: Re: [cpan #12775] Odd number of elements in hashref because map and split bug
From: Sébastien Aperghis-Tramoni <sebastien [...] aperghis.net>
To: bug-Parse-Syslog-Mail [...] rt.cpan.org
RT-Send-Cc:
Hello, Show quoted text
> MTAs sometimes return messages with commas in the text, (and sometimes > results contain additional equal signs)
Indeed, I knew that could happen but as I didn't had such messages in the logs I used for my tests, I preferred to postpone this for a next release :-) Show quoted text
> This patch finds commas with a word immediately followed by and equal > sign and chops the map up before that word by replacing the comma with > a tab character. The map is then produced using \t instead of ', '. > The patch also limits the amount of splits that happen within each > mapped element, in the case of a result string also containing an > equal sign, to ensure equal hash elements.
Good idea. I've applied the patch. I was pondering about keeping the host and program parameters, but it seem you want them. Concerning the text, I'm wondering what it'll look like, but I understand it can be useful. I have to write new tests to cover these cases, then I'll release the new version on the CPAN, probably tomorrow night. Thanks for the patch. Regards, Sébastien Aperghis-Tramoni -- - --- -- - -- - --- -- - --- -- - --[ http://maddingue.org ] Close the world, txEn eht nepO
From: jthardy [...] uta.edu
I've been using this one for the past two weeks now, and haven't encountered an issue that it doesn't resolve. I've added an additional check to see if the line contains seperated tokens instead of assuming that every line will. This takes care of DSN messages that are formatted oddly (ie. do not contain commas or equal-seperated tokens).
--- Mail.pm 2005-05-11 17:11:26.000000000 -0500 +++ /home/jthardy/Mail.pm 2005-06-01 13:58:19.368943144 -0500 @@ -170,19 +170,28 @@ redo unless $log->{program} =~ /^(?:sendmail|postfix)/; redo if $log->{text} =~ /^(?:NOQUEUE|STARTTLS)/; - $log->{text} =~ s/^(\w+):// and my $id = $1; # gather the MTA unique id + $log->{text} =~ s/^(\w+):// and my $id = $1; # gather the MTA unique id redo unless $id; - redo if $log->{text} =~ /^\s*(?:Milter|SYSERR)/; # we don't treat these + redo if $log->{text} =~ /^\s*(?:Milter|SYSERR)/; # we don't treat these - $log->{text} =~ s/^\s*([^=]+)\s*$/status=$1/; - my @fields = split ', ', $log->{text}; - %mail = map { - s/,$//; s/^ +//; s/ +$//; # cleaning spaces - s/^stat=/status=/; # renaming 'stat' field to 'status' - split /=/ - } @fields; + $log->{text} =~ s/^\s*([^=]+)\s*$/status=$1/; # format status messages + $log->{text} =~ s/collect: /collect=/; # treat collect messages as field identifiers + $log->{text} =~ s/([^\s]+),\s+([^\s]+)=/$1\t$2=/g; # replace field seperators with tab characters + + my @fields = split '\t', $log->{text}; + if ( $log->{text} =~ /[:,]\s+[^\s]+=/ ) { + %mail = map { + s/,$//; s/^ +//; s/ +$//; # cleaning spaces + s/^stat=/status=/; # renaming 'stat' field to 'status' + s/.*\s+([^\s]+=)/$1/; # cleaning up field names + split(/=/,$_,2); # dont split into more than 2 elements for each tab character + } @fields; + } $mail{id} = $id; + $mail{host} = $log->{host}; + $mail{program} = $log->{program}; + $mail{text} = $log->{text}; $mail{timestamp} = $log->{timestamp}; }
[guest - Wed Jun 1 15:05:45 2005]: Show quoted text
> I've been using this one for the past two weeks now, and haven't > encountered an issue that it doesn't resolve. I've added an additional > check to see if the line contains seperated tokens instead of assuming > that every line will. This takes care of DSN messages that are formatted > oddly (ie. do not contain commas or equal-seperated tokens).
Thanks, I've integrated your patch, but could you send me some examples of such logs so that I test with these if everything works as expected? I'll upload this new version as soon as PAUSE is operationnal. Regards -- Close the world, txEn eht nepO.
[jthardy@uta.edu - Thu Jun 2 12:45:16 2005]: Show quoted text
> Here are a few examples of odd messages:
Thanks! I'll include these in the test suite, although I'm still not quite sure of the best (hear: laziest) way to test all of this. Any way, I'll release the new version tomorrow. Thanks again for your patches and your samples. -- Close the world, txEn eht nepO.
Your sample logs proved to be VERY useful, as after writing t/10fields.t, I found another bug thanks to these :-) I just uploaded version 0.04 on CPAN, so it should hit your nearest mirror in a few hours. Regards -- Close the world, txEn eht nepO.