Need Help Sorting a List of Strings Based on Its Substring

Interesting. My ageing iMac gives similar times with Nigel’s, but is considerably slower with mine:

 iMac15,1,  macOS Version 10.15.4 (Build 19E287),  100 iterations
          First Run   Total Time    Average     Median    Maximum    Minimum   Std.Dev.
 First       0.0003       0.0216     0.0002     0.0002     0.0004     0.0002     0.0000
 Second      0.0067       0.5375     0.0054     0.0054     0.0068     0.0047     0.0004
 Ratio (excluding first run): 1:24.88   Ratio of medians: 1:26.37

Out of curiosity I changed it to reverse of paragraphs, and it made about a 1% difference. So not much.

Running the test script below, I’m finding that Shane’s original begins to overtake my modification of it somewhere between 20 and 25 “boards” — ie. between 100 and 125 strings. But if my modification’s further modified to loop through the original list in a script object, then it’s faster even than my vanilla sort. There’s still only a few thousandths of a second in it, though. :slight_smile:

use AppleScript version "2.5" -- OS X 10.12 (Sierra, not El Capitan) or later
use framework "Foundation"
use framework "GameplayKit" -- for shuffledArray()
use sorter : script "Custom Iterative Ternary Merge Sort"
use scripting additions

on createList(numberOfBoards)
	script o
		property template : {"5e", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "__Board ", "", " | ", ""}
		property hex : characters of "0123456789abcdef"
		property cats : {"Done", "List 1", "List 2", "To Do", "Test for Attachments"}
		property listOfStrings : {}
	end script
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ""
	repeat with i from 1 to numberOfBoards
		set item -3 of o's template to i
		repeat with j from 1 to 5
			repeat with k from 2 to 23
				set item k of o's template to some item of o's hex
			end repeat
			set item -1 of o's template to item j of o's cats
			set end of o's listOfStrings to o's template as text
		end repeat
	end repeat
	set AppleScript's text item delimiters to astid
	return (current application's class "NSArray"'s arrayWithArray:(o's listOfStrings))'s shuffledArray() as list
end createList

on timeVanillaSort(listOfStrings) -- NG.
	set startTime to current application's class "NSDate"'s new()
	script onTextItem2
		on isGreater(a, b)
			return (text item 2 of a > text item 2 of b)
		end isGreater
	end script
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "__"
	considering numeric strings
		tell sorter to sort(listOfStrings, 1, -1, {comparer:onTextItem2})
	end considering
	set AppleScript's text item delimiters to astid
	return {VanillaTime:-(startTime's timeIntervalSinceNow())}
end timeVanillaSort

on timeOriginalASObjCSort(listOfStrings) -- Shane original.
	set startTime to current application's class "NSDate"'s new()
	set listOfStrings to current application's NSArray's arrayWithArray:listOfStrings
	set theDict to current application's NSMutableDictionary's dictionary()
	repeat with aString in listOfStrings
		(theDict's setObject:((aString's componentsSeparatedByString:"__")'s lastObject()) forKey:aString)
	end repeat
	set theResult to (theDict's keysSortedByValueUsingSelector:"localizedStandardCompare:") as list
	return {originalASObjCTime:-(startTime's timeIntervalSinceNow())}
end timeOriginalASObjCSort

on timeModifiedASObjCSort(listOfStrings) -- Shane, modified by NG.
	set startTime to current application's class "NSDate"'s new()
	set theDict to current application's NSMutableDictionary's dictionary()
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "__"
	repeat with aString in listOfStrings
		(theDict's setObject:(aString's last text item) forKey:aString)
	end repeat
	set AppleScript's text item delimiters to astid
	set theResult to (theDict's keysSortedByValueUsingSelector:"localizedStandardCompare:") as list
	return {modifiedASObjCTime:-(startTime's timeIntervalSinceNow())}
end timeModifiedASObjCSort

on timeTwiceModifiedASObjCSort(listOfStrings) -- Shane, modified twice by NG.
	set startTime to current application's class "NSDate"'s new()
	script o
		property lst : listOfStrings
	end script
	set theDict to current application's NSMutableDictionary's dictionary()
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "__"
	repeat with i from 1 to (count listOfStrings)
		set aString to item i of o's lst
		(theDict's setObject:(aString's last text item) forKey:aString)
	end repeat
	set AppleScript's text item delimiters to astid
	set theResult to (theDict's keysSortedByValueUsingSelector:"localizedStandardCompare:") as list
	return {twiceModifiedASObjCTime:-(startTime's timeIntervalSinceNow())}
end timeTwiceModifiedASObjCSort

on test(numberOfBoards)
	set listOfStrings to createList(numberOfBoards)
	copy listOfStrings to duplicateListOfStrings
	copy listOfStrings to duplicateListOfStrings2
	copy listOfStrings to duplicateListOfStrings3
	
	set numberOfStrings to {numberOfStrings:numberOfBoards * 5}
	numberOfStrings & timeVanillaSort(listOfStrings) & ¬
		timeOriginalASObjCSort(duplicateListOfStrings) & ¬
		timeModifiedASObjCSort(duplicateListOfStrings2) & ¬
		timeTwiceModifiedASObjCSort(duplicateListOfStrings3)
end test

test(125) -- Param: number of boards. (5 strings per board.)
1 Like

@NigelGarvey, what version of macOS are you running? Your version is still much faster here, and I’m wondering if it’s an OS thing, or I need to pass the hat around to buy a new Mac :wink:.

Probably I’m missing something here, and I haven’t done anything with sorting string lists for years.
My approach would have been

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theTx to "5e9204033debb66ef9d06cbf__Board 1 | To Do
5e9210c43823383ffec282dd__Board 1 | Done
5e91dfae766168544765167f__Board 2 | Test for Attachments
5e9203e3b964487863f7b4ed__Board 2 | Done
5e91dec34033e73893cad871__Board 1 | List 2
5e9203d1918b11356052f065__Board 2 | To Do
5e91deb9b5477c7bf6831ab9__Board 1 | List 1"

set sortScript to "echo " & quoted form of theTx & " |  sort  -t '_' -k 2 "
set res to do shell script sortScript

I have similar results than Nigel’s:

{
numberOfStrings:125,
A_VanillaTime:0.008211016655,
B_originalASObjCTime:0.02586710453,
C_modifiedASObjCTime:0.021585941315,
D_twiceModifiedASObjCTime:0.007339954376
}

{
numberOfStrings:625,
A_VanillaTime:0.051606059074,
B_originalASObjCTime:0.123377084732,
C_modifiedASObjCTime:0.378246903419,
D_twiceModifiedASObjCTime:0.034955024719
}

It’s going to be slower, sort won’t consider numeric strings, and it’s not Unicode-aware.

1 Like

Latter is true :smile:

But it just did a little test with a list of 1000 entries and somehow modified script to sort by name and following number:

set sortScript to "echo " & quoted form of theTx & " | sort -t '_' -k 2 -t ' ' -k 3 -n"
Just to get rid of a sort order like “Board 1, Board 10, Board 100 …”
It took 0.07 (displayed in Script Debugger) on my ancient 2012 MBP

But as said it’s more than a decade ago I did things like that.
The dict way is better.

10.14.6.

Oh … er … I’m sure it must be an OS thing! :wink:

These hotrod clubs are wonderful – they always spend many orders of magnitude more time in optimisation than is actually gained in production :slight_smile:

Time to garage the old VW and just switch to JS ?

(Array.sort is more than good enough. Speed is an easily-grasped, endearing, and clearly addictive proxy for quality, but https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good, and hotrodding is not exactly famous for its contribution to reliability, or efficient use of human time …)

Nigel, I am totally amazed at your many skills: programming, tenacity, problem solving.
What a great script! Thank you for all the analysis and tools.

Here’s my very minor contribution:

BTW, to get the results into Excel, I had to use another one of your great tools:
on recordToString

Shane, the times I show above are for:
Script Debugger 7.0.12 (7A112) on macOS 10.14.6 (Mojave)
2019 iMac-27 512GB SSD, 3.6Ghz i9 CPU, 40GB RAM

So no Catalina users?

Hi, I am very interested in this solution to sort a list of strings based on a substring - particularly the use of a Mutable dictionary. I copied the solution proposed on Post 15#. But I get an error at “set theDict to current application’s NSMutableDictionary’s dictionary {}”. I noted that when I copied the script the SD interpreter does not display the last part of the command in red: current’s application’s NSMutableDictionary’s dictionary (). the last word dictionary should be in RED but it is displayed as BLUE.
As a result I get an AppleScript error:


applescriptErrorNSMutableDictionary
Could some of you explain what I am doing wrong?

I’m guessing you’re using a version of macOS that still allows third-party scripting additions, and one of them defines the term dictionary. Try changing the code to:

current’s application's NSMutableDictionary's |dictionary|()

Thank you very much for the prompt reply. And YES your correction did the trick! I am using Mac OS X High Sierra 10.13.6.
I will know play with mutable dictionaries to reorder text files.

Hey, Nigel, I’m trying to make a generic handler using your sorting library, but I think there’s something I’m not understanding.

Here’s what
I am trying:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

use sorter : script "Custom Iterative Ternary Merge Sort"


set listOfStrings to paragraphs of "6:00 am|(BEIN1)|SPORTS|Footvolley|2021 Miami Beach World Series, Part 3|||||||||1 hr.|
6:00 am|(BSW)|SPORTS|NHL Hockey|Tampa Bay Lightning at Los Angeles Kings|From Staples Center in Los Angeles.||||||||2 hrs.|
6:00 am|(ESPNU)|SPORTS|College Basketball|Iowa State at Texas Tech|From United Supermarkets Arena in Lubbock, Texas.||||||||2 hrs.|
7:30 am|(NBATV)|SPORTS|NBA Basketball|Detroit Pistons at Golden State Warriors|From Chase Center in San Francisco.||||||||2 hrs. 30 mins.|
8:30 am|(GOLTV)|SPORTS|Fútbol Ecuatoriano Primera División||Final, partido de vuelta. Emelec enfrenta a Independiente del Valle en el estadio Banco del Pacífico Capwell. En el primer encuentro de la serie definitoria, disputado en Sangolquí, los rayados se impusieron a los eléctricos por 3-1.||||||||2 hrs. 30 mins.|"
set AppleScript's text item delimiters to "|"
repeat with x from 1 to count of listOfStrings
  set item x of listOfStrings to text items of item x of listOfStrings
end repeat
my SortAListByItem(listOfStrings, 4)

on SortAListByItem(listOfStringLists, itemToSortOn)
  script sortByItem
     on isGreater(a, b)
        return (item itemToSortOn of a > item itemToSortOn of b)
     end isGreater
  end script
  
  tell sorter to sort(listOfStringLists, 1, -1, {comparer:SortAListByItem})
  
  return listOfStrings
end SortAListByItem

Is this even doable? Am I on the right track?

EDIT: Included the full script with data

Nevermind… after a bit more trial and error I figured out what my mistakes were. Here’s a script containing the handlers I’m using.

The first two handlers do the bulk of my appleScript sorting, and over the years I’ve used several different ways to do the same thing. Moving forward, as I update old scripts I’ll replace the old handlers with these.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

use sorter : script "Custom Iterative Ternary Merge Sort"


set listOfStrings to paragraphs of "6:00 am|(BEIN1)|SPORTS|Footvolley|2021 Miami Beach World Series, Part 3|||||||||1 hr.|
6:00 am|(BSW)|SPORTS|NHL Hockey|Tampa Bay Lightning at Los Angeles Kings|From Staples Center in Los Angeles.||||||||2 hrs.|
6:00 am|(ESPNU)|SPORTS|College Basketball|Iowa State at Texas Tech|From United Supermarkets Arena in Lubbock, Texas.||||||||2 hrs.|
7:30 am|(NBATV)|SPORTS|NBA Basketball|Detroit Pistons at Golden State Warriors|From Chase Center in San Francisco.||||||||2 hrs. 30 mins.|
6:00 am|(BEIN1)|SPORTS|Footvolley|2021 Miami Beach World Series, Part 3|||||||||1 hr.|
6:00 am|(BSW)|SPORTS|NHL Hockey|Tampa Bay Lightning at Los Angeles Kings|From Staples Center in Los Angeles.||||||||2 hrs.|
6:00 am|(ESPNU)|SPORTS|College Basketball|Iowa State at Texas Tech|From United Supermarkets Arena in Lubbock, Texas.||||||||2 hrs.|
7:30 am|(NBATV)|SPORTS|NBA Basketball|Detroit Pistons at Golden State Warriors|From Chase Center in San Francisco.||||||||2 hrs. 30 mins.|
6:00 am|(BEIN1)|SPORTS|Footvolley|2021 Miami Beach World Series, Part 3|||||||||1 hr.|
6:00 am|(BSW)|SPORTS|NHL Hockey|Tampa Bay Lightning at Los Angeles Kings|From Staples Center in Los Angeles.||||||||2 hrs.|
6:00 am|(ESPNU)|SPORTS|College Basketball|Iowa State at Texas Tech|From United Supermarkets Arena in Lubbock, Texas.||||||||2 hrs.|
7:30 am|(NBATV)|SPORTS|NBA Basketball|Detroit Pistons at Golden State Warriors|From Chase Center in San Francisco.||||||||2 hrs. 30 mins.|
8:30 am|(GOLTV)|SPORTS|Fútbol Ecuatoriano Primera División||Final, partido de vuelta. Emelec enfrenta a Independiente del Valle en el estadio Banco del Pacífico Capwell. En el primer encuentro de la serie definitoria, disputado en Sangolquí, los rayados se impusieron a los eléctricos por 3-1.||||||||2 hrs. 30 mins.|"
set AppleScript's text item delimiters to "|"
set listOfLists to {}

repeat with x from 1 to count of listOfStrings
   set the end of listOfLists to text items of item x of listOfStrings
end repeat

set sortedTextList to my sortTextList(listOfStrings)

set sortedTextListDescending to my sortTextListDescending(listOfStrings)

set sortedListsOfLists to my SortAListOfListsByItem(listOfLists, 4)

return {sortedTextList, sortedTextListDescending, sortedListsOfLists}

on SortAListOfListsByItem(listOfLists, itemToSortOn)
   script sortByItem
      on isGreater(a, b)
         return (item itemToSortOn of a > item itemToSortOn of b)
      end isGreater
   end script
      
   tell sorter to sort(listOfLists, 1, -1, {comparer:sortByItem})
   return listOfLists
end SortAListOfListsByItem

on sortTextList(listToSort)
   script descending
      on isGreater(a, b)
         return (a < b)
      end isGreater
   end script
   
   tell sorter to sort(listToSort, 1, -1, {})
   return listToSort
end sortTextList

on sortTextListDescending(listToSort)
   script descending
      on isGreater(a, b)
         return (a < b)
      end isGreater
   end script
   
   tell sorter to sort(listToSort, 1, -1, {comparer:descending})
   return listToSort
end sortTextListDescending

Hi Ed.

I was just on the point of posting my observations, but you got there first. :slight_smile: Glad the problem’s solved.

Thanks, Nigel, I’m finding your sorting script very handy!

If you’re around, I’ve got another question.

We want to sort these events by sport and time.

So all the NBA Basketball is together, in order of air time.

I’ve added a routine that converts the show time to a number (number of minutes since midnight) and do that source first, but when I sort by sport it changes loses the order.

Suggestions?

Hi Ed.

That should work. :thinking: It does for me. Sort listOfLists on the times, then sort it again on the sports. Merge sorts are stable, so the times should wind up in order within the associated sport groupings. You then have to convert the list of lists back to text if that’s what you need.

An alternative to sorting twice would be to sort once on both criteria simultaneously. This can be slightly faster if the list’s based on a large number of paragraphs:

set sortedListsOfLists to my SortAListOfListsBy2Items(listOfLists, 4, 1)

on SortAListOfListsBy2Items(listOfLists, sortItem, subsortItem)
	script sortByItem
		on isGreater(a, b)
			if (item sortItem of a = item sortItem of b) then return (item subsortItem of a > item subsortItem of b)
			return (item sortItem of a > item sortItem of b)
		end isGreater
	end script
	
	tell sorter to sort(listOfLists, 1, -1, {comparer:sortByItem})
	return listOfLists
end SortAListOfListsBy2Items

(By the way, your sortTextList() handler doesn’t need the script object.)