Subject: | File::Listing::apache parsing of HTMLTable indexes |
File::Listing version 5.814 (and others) doesn't properly parse
directory listings from Apache2.2 generated using mod_autoindex with
IndexOptions HTMLTable turned on.
I tried turning HTMLTable off in IndexOptions, but failed. I might
figure it out someday, but in the meantime I tried to figure out how to
get File::Listing::apache to properly parse the listing.
The non-HTMLTable output from mod_autoindex would look like this:
<a href="file.ext">file.ext</a> 22-May-2009 22:35 1.0M
with HTMLTable turned on, it looks like this:
<tr><td valign="top"><img src="/icons/compressed.gif" alt="[
]"></td><td><a href="file.ext">file.ext</a></td><td
align="right">22-May-2009 22:35 </td><td align="right">1.0M</td></tr>
There are a bunch of td and tr tags included. The problem with this is
that there are a set of tags between the HH:MM and the file size. The
regex in File::Listing::apache expects only space between HH:MM and the
file size.
I tried working up a regex to deal with all the extra HTML tags, but in
the end I figured it would be easier just to strip the tags out:
s/\<\/?(tr|th|td|img|font)[^\<]*\>//ig
Could you please add this stripping regex prior to the match regex? I
think that it should work transparently. I don't know all the flavors
of index listings that mod_autoindex can create, but hopefully this will
help more of them be dealt with.
Thanks,
Fred