Skip Menu |

This queue is for tickets about the File-Find-Duplicates CPAN distribution.

Report information
The Basics
Id: 9813
Status: resolved
Priority: 0/
Queue: File-Find-Duplicates

People
Owner: Nobody in particular
Requestors: nothingmuch [...] woobling.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.05
Fixed in: 1.00



Subject: File::Find::Duplicates checks for MD5 but it's report doesn't show it
File::Find::Rule constructs a duplicate list as keyed by size only, and so, even if files have a unique MD5, which is accounted for in the actual traversal, the user can't extract that info without recomputing the MD5 themselves. push @{$dupes{$size}}, @{$md5{$hash}} if (@{$md5{$hash}} > 1); should be changed into push @{$dupes{$size}{$md5}}, @{$md5{$hash}} if (@{$md5{$hash}} > 1); IMHO. But this breaks backward compatibility.
From: nothingmuch [...] woobling.org
[guest - Mon Jan 17 04:04:13 2005]: Show quoted text
> push @{$dupes{$size}{$md5}}, @{$md5{$hash}}
err, s/md5/hash/ in hash key.
Date: Mon, 17 Jan 2005 10:45:08 +0000
From: Tony Bowden <tony [...] kasei.com>
To: Guest via RT <bug-File-Find-Duplicates [...] rt.cpan.org>
Subject: Re: [cpan #9813] File::Find::Duplicates checks for MD5 but it's report doesn't show it
RT-Send-Cc:
On Mon, Jan 17, 2005 at 04:04:13AM -0500, Guest via RT wrote: Show quoted text
> File::Find::Rule constructs a duplicate list as keyed by size only, > and so, even if files have a unique MD5, which is accounted for in the > actual traversal, the user can't extract that info without recomputing > the MD5 themselves.
I don't understand the problem here. Can you try explaining it a different way? Thanks, Tony
From: nothingmuch [...] woobling.org
[tony@kasei.com - Mon Jan 17 06:11:35 2005]: Show quoted text
> I don't understand the problem here. Can you try explaining it a > different way?
$ mkdir tmp $ cd tmp $ echo foo | tee foo1 foo2 $ echo bar | tee bar1 bar2 $ perl -MFile::Find::Duplicates -MData::Dumper -e 'print Dumper({ find_duplicate_files(@ARGV) })' . $VAR1 = { '4' => [ './bar1', './bar2', './foo1', './foo2' ] }; but bar* and foo* are not the same... On the other hand, with that proposed fix, you get: $VAR1 = { '4' => { 'c157a79031e1c40f85931829bc5fc552' => [ './bar1', './bar2' ], 'd3b07384d113edec49eaa6238ad5ff00' => [ './foo1', './foo2' ] } }; which shows that bar* and foo* are indeed, different.
Date: Mon, 17 Jan 2005 23:21:13 +0000
From: Tony Bowden <tony [...] kasei.com>
To: Guest via RT <bug-File-Find-Duplicates [...] rt.cpan.org>
Subject: Re: [cpan #9813] File::Find::Duplicates checks for MD5 but it's report doesn't show it
RT-Send-Cc:
On Mon, Jan 17, 2005 at 03:43:54PM -0500, Guest via RT wrote: Show quoted text
> $ echo foo | tee foo1 foo2 > $ echo bar | tee bar1 bar2 > $ perl -MFile::Find::Duplicates -MData::Dumper -e 'print Dumper({ > find_duplicate_files(@ARGV) })' .
Ah yes, now I see the problem. I'll probably want to solve this a slightly different way, though. Let me think about it for a while. If I can't come up with something in a week or so hassle me again and I'll just apply this patch :) Thanks, Tony
Show quoted text
> > $ echo foo | tee foo1 foo2 > > $ echo bar | tee bar1 bar2 > > $ perl -MFile::Find::Duplicates -MData::Dumper -e 'print Dumper({ > > find_duplicate_files(@ARGV) })' .
> Ah yes, now I see the problem. > I'll probably want to solve this a slightly different way, though. Let > me think about it for a while. If I can't come up with something in a > week or so hassle me again and I'll just apply this patch :)
Sorry for taking so long to get around to this. I've gone with returning a completely different data structure, but hopefully it should solve this for you. New version on CPAN now. Thanks, Tony