CodeWiki : checkdupes

WikiHome :: List Pages :: Login
cmantito.com

Script Information


This script checks files specified on the command line for duplicates based on checksums. Version 0.3a will also highlight files in different directories with the same name. Version 0.2a introduced the capability of recursion using the -a flag. See documentation.



Documentation


Usage: ./checkdupes.pl <-a|files...>
At least two or more files must be specified. Any folders
specified will be skipped. Wildcards will work, for example:
./checkdupes.pl *
will check all files in the current folder. Likewise:
./checkdupes.pl folder/* folder-2/*
will check all files in both those folders. It will not
recurse on it's own.

Alternatively (as of 0.2a), you can pass the -a flag:
./checkdupes.pl -a
This will check all files, recursively, starting from the
current working directory.

As of version 0.3a, it will also find files with duplicate
names that reside in different directories. No flag is
necessary to enable this functionality.

./checkdupes.pl 0.3a by cmantito <http://code.cmantito.com>




Script Source


checkdupes.pl
#!/usr/bin/perl

if($ARGV[0] =~ /\-a/i && !$ARGV[1]){
    system("find --version &>/dev/null");
    if($? != 0){
        print STDERR "FATAL: find not in \$PATH (".$ENV{'PATH'}.") - aborting.\n";
        exit(2);
    }
   
    $flist = `find ./`;
    @ARGV = split(/\n/, $flist);
}

if(@ARGV[0] =~ /\-(?:h|v|\?)/i || !$ARGV[0] || !$ARGV[1] || ($ARGV[0] =~ /\-a/i && $ARGV[1])){
    print STDERR "\nUsage: ".$0." <-a|files...>\n";
    print STDERR "\tAt least two or more files must be specified. Any folders\n";
    print STDERR "\tspecified will be skipped. Wildcards will work, for example:\n";
    print STDERR "\t\t".$0." *\n";
    print STDERR "\twill check all files in the current folder. Likewise:\n";
    print STDERR "\t\t".$0." folder/* folder-2/*\n";
    print STDERR "\twill check all files in both those folders. It will not\n";
    print STDERR "\trecurse on it's own.\n\n";
    print STDERR "\tAlternatively (as of 0.2a), you can pass the -a flag:\n";
    print STDERR "\t\t$0 -a\n";
    print STDERR "\tThis will check all files, recursively, starting from the\n";
    print STDERR "\tcurrent working directory.\n\n";
    print STDERR "\tAs of version 0.3a, it will also find files with duplicate\n";
    print STDERR "\tnames that reside in different directories. No flag is\n";
    print STDERR "\tnecessary to enable this functionality.\n\n";
    print STDERR $0." 0.3a by cmantito <http://code.cmantito.com>\n\n";
    exit(1);
}

system("md5sum --version &>/dev/null");
if($? != 0){
    print STDERR "FATAL: md5sum not in \$PATH (".$ENV{'PATH'}.") - aborting.\n";
    exit(2);
}

$progress = 0;
while($file = shift(@ARGV)){
    print "Processing ".$progress."/".++$#ARGV."...\n";
    if(-e $file){
        if( !-d $file){
            $escfile = $file;
            $escfile =~ s/(.)/\\$1/ig;
            $sum = `md5sum $escfile`;
            $sum =~ s/^(.+?)\s+.+$/\1/ig;
            chomp($sum);
            $sums{$sum}++;
            push(@$sum, $file);
           
            $filename = $file;
            $filename =~ s/(.+)\/(.+)$/\2/;
            if(!$filenames{$filename}){
                $filenames{$filename} = $file;
            }else{
                $filenames{$filename} .= "¬".$file;
            }
            $filenamecount{$filename}++;
        }
    }else{
        print STDERR "FATAL: Couldn't open file: ".$file." - aborting.\n";
        exit(1);
    }
    $progress++;
}

print "\n--- DUPLICATE FILES ---\n";
foreach $sum(keys %sums){
    if($sums{$sum} > 1){
        print $sum.": ";
        foreach $file(@$sum){
            print $file." ";
        }
        print "\n";
    }
}
print "\n--- DUPLICATE NAMES ---\n";
foreach $filename(keys %filenamecount){
    if($filenamecount{$filename} > 1){
        print $filename.": ";
        $filelist = $filenames{$filename};
        $filelist =~ s/¬/ /ig;
        print $filelist."\n";
    }
}

print "\n";
exit(0);



Categories: CategoryAlpha
Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki