Subject: | "Malformed" lsof output |
Date: | Tue, 12 May 2009 14:31:08 -0700 |
To: | bug-Unix-Lsof [...] rt.cpan.org |
From: | Mike Dillon <dillonm [...] yahoo-inc.com> |
I was checking out Unix::Lsof and saw this:
This warning probably shows a bug in your lsof installation, since
it reports a malformed lsof output. To my knowledge this has so far
only been experienced on CentOS 5.2 with the RedHat build of lsof
4.78, if you experience it with any other combination of OS or lsof
version I'd appreciate if you could tell me about it. C<Unix::Lsof>
tries to work around this bug but it is possible that the results it
returns may be wrong.
I'm pretty sure this analysis of the situation is not correct. I
encountered a problem along these lines using what is likely the same
build of lsof under RHEL 5. I believe the real issue is that the parsing
code in Unix::Lsof does not match the format described in the lsof man
page:
When the NUL (000) field terminator has been selected with the 0
(zero) field identifier character, lsof ends each process and file
set with a NL (012) character.
What this says to me is that NUL is the normal "field terminator" in
-F0 mode, but that the last field in a "process set" or "file set" will
additionally have a trailing NL after the NUL. I believe the problem is
that the Unix::Lsof code is first assuming that this is a line-delimited
format and splitting things up into lines (by calling IPC::Run3 with an
array ref), then it is going through and attempting to split each "line"
using
NUL.
What it should be doing is something like this:
my $output;
...
run3( ..., \$output, ...);
...
my @fields = split /\000\012?/, $output;
The only information being lost here is whether the field terminator was
followed by the process/file set terminator. I'm not sure that matters,
but you could capture the delimiter and adjust the code if that is
important. As long as a NUL is never followed by a newline in any case
except the last field in a set, this code should be pretty robust.
I've attached a proof-of-concept of this parsing approach.
-md
Message body is not shown because sender requested not to inline it.