What's the best way to find duplicates in a NSArray?

I can’t remember where in the Apple Reference Documentation I have seen a most direct method to achieve this:

use framework "Foundation"
use scripting additions

set arrayA to current application's NSArray's arrayWithArray:{1, 2, 3, 4, 3, 5, 3}
set arrayB to (current application's NSOrderedSet's orderedSetWithArray:arrayA)'s allObjects()
set theDiffs to (arrayB's differenceFromArray:arrayA)'s removals()
if theDiffs as list = {} then return {}
return (theDiffs's valuesForKeys:{"object", "index"}) as record
2 Likes

Shane taught us as following…

-- Created 2017-11-07 by Takaaki Naganoya
-- 2017 Piyomaru Software
use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

property NSCountedSet : a reference to current application's NSCountedSet

set aList to {1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 8, 10, -2}
set aRes to returnDuplicatesOnly(aList) of me
-->	{​​​​​1, ​​​​​2, ​​​​​3, ​​​​​4​​​}

on returnDuplicatesOnly(aList as list)
	set aSet to NSCountedSet's alloc()'s initWithArray:aList
	set bList to (aSet's allObjects()) as list
	
	set dupList to {}
	repeat with i in bList
		set aRes to (aSet's countForObject:i)
		if aRes > 1 then
			set the end of dupList to (contents of i)
		end if
	end repeat
	
	return dupList
end returnDuplicatesOnly

Another NSCountedSet suggestion:

use framework "Foundation"
use scripting additions

set arrayOne to current application's NSArray's arrayWithArray:{1, 2, 3, 4, 6, 3, 5, 3, 4, 6}
set setOne to current application's NSCountedSet's alloc()'s initWithArray:arrayOne
set arrayTwo to (arrayOne's valueForKeyPath:"@distinctUnionOfObjects.self")
set setTwo to current application's NSCountedSet's alloc()'s initWithArray:(arrayTwo)
setOne's minusSet:setTwo
return setOne's allObjects() as list -->{3, 6, 4}

I ran a timing test with a list that contained 792 items, and my script took 1 millisecond to run. Jonas’ script took 8 milliseconds to run, but that’s to be expected as it returns significantly more information.

Script Geek says peavine’s script is about 20% faster than mine.

Mac14,15,  macOS Version 15.3 (Build 24D60),  1000 iterations
         First Run   Total Time    Average     Median    Maximum    Minimum   Std.Dev.
First       0.0016       0.3052     0.0003     0.0003     0.0008     0.0002     0.0000
Second      0.0012       0.2571     0.0003     0.0003     0.0006     0.0002     0.0000
Ratio (excluding first run): 1.19:1   Ratio of medians: 1.18:1

I’ll use this.

1 Like

Like ionah’s script, this returns both the duplicates and their indices. It’s not quite as fast as ionah’s, but it should work on systems older than macOS 10.15 and it returns 1-based indices. It can easily be adapted for 0-based indices if required.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on findDuplicates(aList)
	script o
		property indexList : aList's items
	end script
	repeat with i from 1 to (count o's indexList)
		set o's indexList's item i to i
	end repeat
	set indexSet to current application's NSMutableIndexSet's indexSetWithIndexesInRange:({0, count o's indexList})
	set anArray to current application's NSArray's arrayWithArray:(aList)
	set firstInstanceIndices to (current application's NSDictionary's dictionaryWithObjects:(o's indexList) forKeys:(anArray))'s allValues()
	repeat with i in firstInstanceIndices
		set i to i as integer
		set o's indexList's item i to missing value
		(indexSet's removeIndex:(i - 1))
	end repeat
	if (indexSet's |count|() = 0) then return {}
	
	return {|index|:o's indexList's integers, object:(anArray's objectsAtIndexes:(indexSet)) as list}
end findDuplicates

set aList to {1, 2, 3, 4, 3, 5, 3}
findDuplicates(aList)

Thanks guys, but I think I’ll stick with my first try.
The advantage here is that it returns the position of the duplicates.

And if someone wants the duplicates only, here is a variation:

use framework "Foundation"
use scripting additions

set theList to {1, 2, 3, 4, 7, 2, 3, 4, 5, 6, 7, 3, 4, 1}

set arrayA to current application's NSArray's arrayWithArray:theList
set arrayB to (current application's NSOrderedSet's orderedSetWithArray:arrayA)'s allObjects()
set theDiff to (arrayB's differenceFromArray:arrayA)
if not (theDiff's hasChanges()) then return {}

-- get duplicates only 
set resultList to ((theDiff's valueForKeyPath:"removals.object")'s allObjects()) as list

-- get duplicates & positions 
set resultRecord to (theDiff's removals()'s valuesForKeys:{"object", "index"}) as record

@NigelGarvey : you posted while I was responding…
@ShaneStanley & @NigelGarvey : I think your like means I’m on the right track…


Update: modified the “get duplicates only” line where there was a bad copy-paste.

Vanilla Script version is twice faster than peavine’s.

http://piyocast.com/as/archives/17236

My earlier script returns a list of duplicate items in an array. As part of its work, the script calculates the number of duplicates (not including the original), and I’ve modified my script to return that additional information as a record.

use framework "Foundation"
use scripting additions

--get a list of duplicates
set arrayOne to current application's NSArray's arrayWithArray:{"aa", "bb", "cc", "aa", "bb", "aa", "dd", "ee", "aa"}
set setOne to current application's NSCountedSet's alloc()'s initWithArray:arrayOne
set arrayTwo to (arrayOne's valueForKeyPath:"@distinctUnionOfObjects.self")
set setTwo to current application's NSCountedSet's alloc()'s initWithArray:(arrayTwo)
setOne's minusSet:setTwo
set theDuplicates to setOne's allObjects()
--return theDuplicates as list --enable if a list of duplicates is all that's needed.

--make a record with the duplicate items as keys and the duplicate counts as values
set theDictionary to (current application's NSMutableDictionary's new())
repeat with aValue in theDuplicates
	set duplicatesCount to (setOne's countForObject:aValue) as integer
	(theDictionary's setValue:duplicatesCount forKey:aValue)
end repeat
set theRecord to theDictionary as record -->{aa:3, bb:1}

Here’s a slightly optimised version of my script above which also allows the index base to be specified as a parameter. This version too should work on any macOS system since 10.10. More thorough testing this morning shows that @ionah’s original generally has the advantage for speed, except where the proportion of duplicates in a large list or array is particularly huge:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on findDuplicates(aList, indexBase)
	script o
		property indexList : aList's items
	end script
	set indexBaseOffset to 1 - indexBase
	repeat with i from 1 to (count o's indexList)
		set o's indexList's item i to i - indexBaseOffset
	end repeat
	set indexSet to current application's NSMutableIndexSet's indexSetWithIndexesInRange:({indexBase, count o's indexList})
	set anArray to current application's NSArray's arrayWithArray:(aList)
	set firstInstanceIndices to (current application's NSDictionary's dictionaryWithObjects:(o's indexList) forKeys:(anArray))'s allValues()
	repeat with i in firstInstanceIndices
		(indexSet's removeIndex:(i))
	end repeat
	if (indexSet's |count|() = 0) then return {}
	
	indexSet's shiftIndexesStartingAtIndex:(indexBase) |by|:(-indexBase)
	return {|index|:((current application's NSArray's arrayWithArray:(o's indexList))'s objectsAtIndexes:(indexSet)) as list, object:(anArray's objectsAtIndexes:(indexSet)) as list}
end findDuplicates

set aList to {1, 2, 3, 4, 3, 5, 3}
set indexBase to 1
findDuplicates(aList, indexBase)