Subject: | Regex for parsing log messages |
Date: | Tue, 28 Dec 2010 15:27:37 +0100 |
To: | bug-Net-Syslogd [...] rt.cpan.org |
From: | Dave Stafford <Dave.Stafford [...] globis.net> |
Hi,
the regex for parsing log messages fails with the following
(valid) log message from host IMAC:
<189>Dec 28 14:06:00 IMAC logger: this is a test message
It parses the date incorrectly as: "Dec 28 14:06:00 IMA"
This is due to the regex on line 204 also trying to parse the timezone
for Cisco formatted log messages.
my $regex = '<(\d{1,3})>[\d{1,}: \*]*((?:[JFMASONDjfmasond]\w\w)
{1,2}(?:\d+)(?: \d{4})* (?:\d{2}:\d{2}:\d{2}[\.\d{1,3}]*)(?:
[A-Z]{1,3})*)?:*\s*(?:((?:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})|(?:[a-zA-Z\-]+))
)?(.*)';
The problem comes with the regex looking for an optional space or
colon after the timestamp part. A better solution would be to
recognise that after the timestamp part is either a space, or a : if
it is a cicso formatted message.
I changed the regex on my system to the following, which seems to work for me:
my $regex = '<(\d{1,3})>[\d{1,}: \*]*((?:[JFMASONDjfmasond]\w\w)
{1,2}(?:\d+)(?: \d{4})* (?:\d{2}:\d{2}:\d{2}[\.\d{1,3}]*)(?:
[A-Z]{1,3})*)?[:|\s](?:((?:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})|(?:[a-zA-Z\-]+))
)?(.*)';
Dave