Skip Menu |

This queue is for tickets about the Filesys-DiskUsage CPAN distribution.

Report information
The Basics
Id: 123154
Status: new
Priority: 0/
Queue: Filesys-DiskUsage

People
Owner: Nobody in particular
Requestors: payerle [...] umd.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Bug/feature request: better handling of hard links
Date: Thu, 28 Sep 2017 17:22:27 -0400 (EDT)
To: bug-Filesys-DiskUsage [...] rt.cpan.org
From: "Thomas M. Payerle" <payerle [...] umd.edu>
It appears the Filesys::DiskUsage (as of 0.04) does not properly handle hard links in Unix file systems. E.g. if rootdir has two subdirs, A and B, and A contains a 10 GB file bigfile.zip, and B contains a hard link to bigfile.zip in A. In this case, Filesys::DiskUsage will report both rootdir/A and rootdir/B as being 10 GB (not unreasonable) and rootdir as 20 GB. But rootdir is only consuming 10 GB of space (the files A/bigfile.zip B/bigfile.zip are sharing the same data blocks on the disk, so although there are two equal paths to the file, there is only one file and only 10 GB of space is consumed). The standard Unix du command (as least the GNU version distributed with recent linuxes) correctly reports 10 GB for rootdir (as well as for rootdir/A and rootdir/B). I glanced briefly at the GNU code, and it looks like it records device and inode numbers for every file it traverses, and uses that information to avoid counting the space consumed by any file more than once. I.e., du rootdir/A will give 10 GB (as A/bigfile.zip is there and consumes 10 GB). Similarly, du rootdir/B gives 10 GB. But du rootdir detects that A/bigfile.zip and B/bigfile.zip have the same device and inode numbers, and only count it once, so du rootdir also only gives 10 GB. While a similar strategy might be useful for Filesys::DiskUsage, I also see that as potentially problematic (due to inefficiencies of storing such in Perl vs C, causing excessive memory consumption on large filesystems). However, a more performance friendly option would be to divide the size of regular files by the number of links. (Only regular files should be so treated; directories normally will have have links for itself, its parent, and each child. But regular files will only have more than one link if there are more than one hard links 'to the file). This should be an option, e.g. 'divide-among-hardlinks' Such a change, for our previous example (and assuming A/bigfile.zip only has 2 links), would result in rootdir reporting 10 GB, and rootdir/A and rootdir/B each reporting 5 GB. This also disagrees with the standard du command in the interpretation of the usage of rootdir/A and rootdir/B, but there is merit to both interpretations (there are a number of questions on the web basically wondering why du rootdir != (du rootdir/A) + (du rootdir/B) ). There will always be some ambiguity when dealing with the diskusage of a tree which does NOT contain all hardlinks to all files in the tree. Tom Payerle DIT-ACIGS/Mid-Atlantic Crossroads payerle@umd.edu 5825 University Research Court (301) 405-6135 University of Maryland College Park, MD 20740-3831