I’m trying to sort 20 years of our family’s pictures and movies.
After scavenging everything I could find in old drives, iPhotos librairies and whatnot, renaming (thank you muCommander) all the names I could find to arbitrary unique strings, I came up with the following logic:
In Finder, if two files have the same creation date, the same extension, the same size and the same physical size, they are very extremely likely to be duplicates.
So, my idea is :
- create a name that gives a reasonable idea of all those attributes
- rename the file with that name
- if the file already exists
- create a folder with that name (minus the extension)
- move that file, there (no need to rename it)
and loop over the ~125,000 files that I have gathered (~500GB).
I tested that with 537 files and ended up with 138 “unique” files. Which is not bad.
I have a backup of the ~125 000 files so I can run the script without worrying too much, but I’d love it if some of you could check the (very small and almost trivial) script for errors.
It took about 1.5 minutes to run the script over the small test set, and I am not using anything fancy so I guess that it would take north of 6 hours to complete the task, unless you teach me how to use black magic so that the thing runs super fast.
In fact, since the script does not use anything that’s not available to a shell script (except for the difference between size and physical size, which I added just to make sure, but I’m not even sure that the difference is), I guess running that as a shell script would make the thing tremendously faster…
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use JC : script "JC_conversions"
tell application "Finder"
set myTargetFolder to target of window 1
set myFiles to the entire contents of myTargetFolder
repeat with myFile in myFiles
set myDate to creation date of myFile
set myYear to (year of myDate) as text
set myMonth to (JC's AddFront0To1Digit:(month of myDate as integer)) as text
set myDay to (JC's AddFront0To1Digit:(day of myDate)) as text
set myHour to (JC's AddFront0To1Digit:(hours of myDate)) as text
set myMinutes to (JC's AddFront0To1Digit:(minutes of myDate)) as text
set mySeconds to (JC's AddFront0To1Digit:(seconds of myDate)) as text
set mySize to size of myFile
set myPhysicalSize to physical size of myFile
set myExtension to name extension of myFile
if myExtension is "jpeg" then set myExtension to "JPG"
set myDateName to {myYear, myMonth, myDay, "-", myHour, myMinutes, mySeconds, "-", mySize, "-", myPhysicalSize, ".", myExtension} as string
set myDuplicateName to {myYear, myMonth, myDay, "-", myHour, myMinutes, mySeconds, "-", mySize, "-", myPhysicalSize} as string
try
set name of myFile to myDateName
on error
try
make new folder at myTargetFolder with properties {name:myDuplicateName}
end try
move myFile to folder myDuplicateName of window 1
end try
end repeat
end tell