Housekeeping on an old hard disk from the bottom of a drawer, I suspect that most of what is on it I’ve kept elsewhere, but some files may be unique, so I want a copy of them prior to scrubbing the disk and disposing of it.
The files have different names and folder locations, eg I may have a photo in:
$DRIVE/media/photos/2003/04/16/123-4567_img.jpg
that I’ve already got a copy of in:
$HOME/media/photos/2003/04/15/Originals/123-4567_IMG.JPEG_original
Too many variations to eyeball them, a simple find and compare in their matching folders got rid of most of them. eg:
for f in `find $dir -type f`; do cmp -s $f $HOME/$f && rm $f; done
Then the harder one of searching for anything with a matching checksum. I found that it was possible to do it using fdupes(1), but is manual and time consuming, so the time-saver below:
# get an MD5 checksum of every file
cd $DRIVE/$dir; md5deep -lr . |sort > ~/tmp/external-${dir}.lst
cd $HOME/$dir; md5deep -lr . |sort > ~/tmp/internal-${dir}.lst
#
join -o1.2 ~/tmp/external-${dir}.lst ~/tmop/internal-${dir}.lst| xargs -d '\n' rm -i
Later, clearly showing that I really should learn to read the fine manuals a little more carefully, I discovered that md5deep(1) can do much of this itself:
# create an MD5 checksum of all files on home drive
cd $HOME/$dir; md5deep -lr . > ~/tmp/internal-${dir}.lst
# use md5deep's "-m" option to find files with no match
cd $DRIVE/$dir; md5deep -m ~/tmp/internal-${dir}.lst .|xargs -d '\n' rm