Bug #48593 for File-Find-Rule: Document Result-Order being arbitrary and not-predictable

Mon Aug 10 12:03:53 2009 KENTNL [...] cpan.org - Ticket created

Subject:

Document Result-Order being arbitrary and not-predictable

This is not very important, but having it documented somewhere would be helpful to users. There's a property of the underlying file system mechanisms in that the order of the data that produces is reasonably arbitrary. This arbitrariness, at present, starts at the driver level, and propagates through the entire call tree, through the kernel, through libc's readdir(), through perls readdir() and through File::Find, ending up at File::Find::Rule. This usually doesn't matter to anyone, but it matters when people decide to do directory comparisons. The following is not guaranteed to pass on any filesystem system('rsync -avp /a /x') system('rsync -avp /a /y') is_deeply( [ File::Find::Rule->in("a") ], [ File::Find::Rule->in("x") ], "Source vs Copy x"); ); is_deeply( [ File::Find::Rule->in("a") ], [ File::Find::Rule->in("y") ], "Source vs Copy y"); ); is_deeply( [ File::Find::Rule->in("x") ], [ File::Find::Rule->in("y") ], "Source vs Copy y"); ); Although, you could get very lucky and it /could/ pass. However, your chances are much worse when you're using 2 different filesystems, or comparing an internal ordering with any real filesystem: People have a tendency to assume ordering is alphabetical, since that is what 'ls' does. A good example of how to start a mess is as follows: 1. Filesystem X * JFS * JFS returns readdir() in alphabetical order 2. Filesystem Y * TMPFS * TMPFS returns readdir() in REVERSE INSERTION ORDER in this case, comparing the indexing of one directly with another will almost /certainly/ not work. Thus, it should be stated, wherever one is comparing directory structures, or anything that could derive from a directory structure ( for example: http://www.nntp.perl.org/group/perl.cpan.testers/2009/08/msg4940109.html ), the results obtained from File::Find::Rule /must/ be sorted prior to doing anything practical with them. system('rsync -avp /a /x') system('rsync -avp /a /y') is_deeply( [ sort { $a cmp $b } File::Find::Rule->in("a") ], [ sort { $a cmp $b } File::Find::Rule->in("x") ], "Source vs Copy x"); ); is_deeply( [ sort { $a cmp $b } File::Find::Rule->in("a") ], [ sort { $a cmp $b } File::Find::Rule->in("y") ], "Source vs Copy y"); ); is_deeply( [ sort { $a cmp $b } File::Find::Rule->in("x") ], [ sort { $a cmp $b } File::Find::Rule->in("y") ], "Source vs Copy y"); ); For maximum user-friendlyness I would have suggested implementing sort by default, because its what people tend to expect, but the penalties of doing that are too high and they're not needed for 90% of cases. Thanks.

Fri Nov 27 19:33:31 2009 RCLAMP [...] cpan.org - Correspondence added

If this needs documenting/changing anywhere it's in File::Find. If there is a option for File::Find to walk thinks in a different order then a user of File::Find::Rule can pass that through using the L</extras> method. -- Richard Clamp <richardc@unixbeard.net>

Fri Nov 27 19:33:32 2009 The RT System itself - Status changed from 'new' to 'open'

Fri Nov 27 19:33:32 2009 RCLAMP [...] cpan.org - Status changed from 'open' to 'resolved'