Subject: | Problem in globbing dotdirectories. |
Date: | Tue, 4 Mar 2008 02:20:54 +0000 |
To: | "File::Find::Rule Bugtraq" <bug-File-Find-Rule [...] rt.cpan.org> |
From: | "Shalom Bhooshi" <s.bhooshi [...] gmail.com> |
When globbing for dotfiles and dotdirectories (.*) the top directory is
included too due to the fact that $_ is set to '.'.
e.g.
$ perl -MFile::Find::Rule -Wle 'my $f=File::Find::Rule->new; print for
$f->name(".*")->in("/tmp")'
/tmp
/tmp/foo
/tmp/bar
...
While this might not be so much of a problem when aggregating results, it
does cause big and hard-to-find problems when pruning and discarding results
because the entire directory tree is likely to be pruned and then no results
are returned, going against DWIM and leaving you well surprised.
#!/usr/bin/perl -Wl
use strict;
use File::Find::Rule;
my @filters;
push @filters, File::Find::Rule # don't descend into dotdirs
->directory
->name( ".*" )
->prune->discard;
push @filters, File::Find::Rule->new; # process everything else
print join "\n",
my @files = File::Find::Rule
->any( @filters )
->in("/tmp");
The above script will usually not return anything. After some debugging, It
turns out that $_ (or $shortname) is set to '.' for the rootdir in all
method names and subsequent matches are done against $_. I believe this
behaviour is present in and inherited from File::Find.
$ perl -MFile::Find::Rule -Wle 'my$f=File::Find::Rule->new;print for
$f->exec(sub{print join " | ", @_})->in("/tmp")'
. | /tmp | /tmp
foo | /tmp | /tmp/foo
bar | /tmp | /tmp/bar
You can circumvent this 'problem' by changing the glob to something like
name( '.*?' ) (with regexes - name( qr/^\..+/ )) but this really goes
against how globs are performed (on unix atleast) and is not DWIM because
you really have to know that $_ is set differently (and inconsistently) for
the top directory and therefore you have to glob differently.
Consider the following GNU find command trying to achieve the same as the
above script that works as expected.
e.g.
$ find /tmp \( -type d -name ".*" -prune \) -o \( -print \)
I'm not entirely sure that $_ within the various methods ought to be changed
from within File::Find (and derived packages) itself, there might be a
reason for this that i am not aware of (I personally think it should be set
to basename($topdir) or undef atleast) and it might cause problems with
legacy code. I think a remedy is necessary for the methods of
File::Find::Rule atleast for the following reasons.
1. Follow accepted glob 'standards' and more importantly maintain DWIM (Do
What I Mean) because in most instances it's the
non-perl-file-find-rule-savvy user that is left baffled
2. Bugs arising from this behaviour can be hard to find especially when
chaining (complex) rules (a subjective reason, but compelling nonetheless).
$ perl -MFile::Find::Rule -Wle 'print $File::Find::Rule::VERSION'
0.30
$ perl -V
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=linux, osvers=2.6.15.7, archname=i486-linux-gnu-thread-multi
uname='linux terranova 2.6.15.7 #1 smp thu jul 12 14:27:56 utc 2007 i686
gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5
-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8
-Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib
-Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN
-fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.1.3 20070929 (prerelease) (Ubuntu
4.1.2-16ubuntu2)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=/lib/libc-2.6.1.so, so=so, useshrplib=true, libperl=
libperl.so.5.8.8
gnulibc_version='2.6.1'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
PERL_MALLOC_WRAP THREADS_HAVE_PIDS USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
Built under linux
Compiled at Dec 4 2007 08:56:39
@INC:
/etc/perl
/usr/local/lib/perl/5.8.8
/usr/local/share/perl/5.8.8
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8
/usr/share/perl/5.8
/usr/local/lib/site_perl
.