Sorting a list of strings by the length of each item


(Ed Stockly) #1

I have a script that I’ve used for years that needs to produce a list of strings sorted by length, from shortest to longest.

The way I’ve done this is step through eat item of the list, get its length, then store {length, content} in a new list of lists. Sort that list by the first item, then go through the list again, extracting the strings and putting those back in the original list in their new order.

I’m just wondering with all the advances in AppleScript and ASObjC, if there’s not a better, faster, stronger way?

Any suggestions?


(CK) #2

A shell script can be a fast way of sorting. This uses awk to to prepend each line with its character length, then sorts the lines numerically, finally stripping off the leading digits of each line.

set str to "A simple line
This one is quite a long line
A Word
The quick brown fox jumped over the lazy dogs"

do shell script ¬
	"awk '{ print length, $0 }' <<<" & ¬
	str's quoted form & ¬
	" | sort -n -s | cut -d ' ' -f 2-"

Result:

"A Word
A simple line
This one is quite a long line
The quick brown fox jumped over the lazy dogs"

So, much the same method as yours, but would perhaps fair better in a speed test…?


#3

Something like this?

set aList to current application's NSArray's arrayWithArray:{"aaa", "aa", "aaaaa", "a"}

set sortedList to aList's sortedArrayUsingDescriptors:{current application's NSSortDescriptor's sortDescriptorWithKey:"length" ascending:true}

https://developer.apple.com/documentation/foundation/nssortdescriptor?language=objc


(Shane Stanley) #4

@letterman has given you a fast alternative, and this takes it a step further by sorting equal-length strings alphabetically:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set aList to current application's NSArray's arrayWithArray:{"bb", "ccc", "D", "A", "aaa", "aa", "aaaaa", "a"}
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"length" ascending:true
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"lowercaseString" ascending:true
set sortedList to (aList's sortedArrayUsingDescriptors:{desc1, desc2}) as list

Edit: A better second sort descriptor would be this:

set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"self" ascending:true selector:"caseInsensitiveCompare:"

(Shane Stanley) #5

It has some limitations, though — your script will sort any strings containing non-ASCII characters incorrectly. (As will the ASObjC versions for characters outside the Basic Multilingual Plane, but they’re fairly rare.)

It’s also fairly slow compared with ASObjC, by more than a couple of orders of magnitude.


(CK) #6

Useful learning points. Thanks.


(Nigel Garvey) #7

But don’t forget that a string’s ‘length’ in ASObjC is in 16-bit units, whereas in AS it’s in characters. In some situations, you might be better off with an AS sort.


(Ed Stockly) #8

Thanks, all.

That could well be an issue Nigel!


(Shane Stanley) #9

Or at least an AS character count. This is probably slower than a good fully-AS sort, but still reasonably brisk:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set aList to {"bb", "ccc", "D", "A", "aaa", "aa", "aaaaa", "a"}
set countList to {}
repeat with aString in aList
	set end of countList to {len:length of aString, val:aString}
end repeat
set countList to current application's NSArray's arrayWithArray:countList
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"len" ascending:true
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"val" ascending:true selector:"caseInsensitiveCompare:"
set sortedList to ((countList's sortedArrayUsingDescriptors:{desc1, desc2})'s valueForKey:"val") as list

(Ed Stockly) #10

So now this is getting pie in the sky, a bit, but what would be really valuable would be if I could give a weighted value to each character in calculating the length.

For example, say an “I” or and “i” or and “l” (lower case “L”) were each worth 1. An “e” or an “a” were each worth 3. A “w” or an “m” 5.

I’d need to be able to assign a value to each letter (actually, since the final product is two different character styles I’d need at least two different values).

Doesn’t matter how long this would take (it would save a user about half an hour any time it ran, so even if it took longer and was fully automated, that would still be more than worthwhile).

After taking this up again, it dawned on me that taking this one step further would resolve multiple problems.


(Shane Stanley) #11

So what you’re really after is the length in units, not character count. Of course that depends on the font, but this will give you the value using some default font (Helvetica or San Francisco, I suspect).

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

set aList to {"bb", "ccc", "D", "A", "aaa", "aa", "aaaaa", "a"}
set countList to {}
repeat with aString in aList
	set atrString to (current application's NSAttributedString's alloc()'s initWithString:aString)
	set theSize to atrString's |size|()
	set end of countList to {len:width of theSize, val:aString}
end repeat
set countList to current application's NSArray's arrayWithArray:countList
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"len" ascending:true
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"val" ascending:true selector:"caseInsensitiveCompare:"
set sortedList to ((countList's sortedArrayUsingDescriptors:{desc1, desc2})'s valueForKey:"val") as list

If you know the font, you could do this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

set aList to {"bb", "ccc", "D", "A", "aaa", "aa", "aaaaa", "a"}
set countList to {}
set theFont to current application's NSFont's fontWithName:"GillSans-Bold" |size|:14.0
set theAtts to current application's NSMutableDictionary's dictionaryWithObject:theFont forKey:(current application's NSFontAttributeName)
repeat with aString in aList
	set atrString to (current application's NSAttributedString's alloc()'s initWithString:aString attributes:theAtts)
	set theSize to atrString's |size|()
	set end of countList to {len:width of theSize, val:aString}
end repeat
set countList to current application's NSArray's arrayWithArray:countList
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"len" ascending:true
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"val" ascending:true selector:"caseInsensitiveCompare:"
set sortedList to ((countList's sortedArrayUsingDescriptors:{desc1, desc2})'s valueForKey:"val") as list

It’s not going to match something like InDesign perfectly, but it should get you very close.

(And that second sort descriptor is probably moot in this case, given the precision of the lengths.)


(Shane Stanley) #12

For kicks I tried the above in InDesign. The lengths match to nearly three decimal places.


(Nigel Garvey) #13

:sunglasses:

I’m finding that with a list of 1,536 items, it’s just over twice as fast to start with countList as a mutable array and to populate it directly with dictionaries:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set aList to {"bb", "ccc", "𐀀", "A", "aaa", "aa", "aaaaa", "a"} & words of "Now is the time for all good men to come to the aid of the party"
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set countList to current application's NSMutableArray's new()
set theKeys to current application's NSArray's arrayWithArray:{"len", "val"}
repeat with aString in aList
	tell countList to addObject:(current application's NSDictionary's dictionaryWithObjects:{aString's length, aString's contents} forKeys:theKeys)
end repeat
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"len" ascending:true
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"val" ascending:true selector:"caseInsensitiveCompare:"
tell countList to sortUsingDescriptors:{desc1, desc2}
set sortedList to (countList's valueForKey:"val") as list

With an identical list, and using my customisable dual-pivot quicksort as a library script, this vanilla code is about three times as fast again:

use sorter : script "Custom Iterative Dual-pivot Quicksort"
use scripting additions

set aList to {"bb", "ccc", "𐀀", "A", "aaa", "aa", "aaaaa", "a"} & words of "Now is the time for all good men to come to the aid of the party"
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList
set aList to aList & aList

script byLength
	on isGreater(a, b)
		set lenA to a's length
		set lenB to b's length
		if (lenA = lenB) then
			return (a > b)
		else
			return (lenA > lenB)
		end if
	end isGreater
end script

-- sort items 1 thru -1 of aList in place using the above script object for the comparisons.
tell sorter to sort(aList, 1, -1, {comparer:byLength})

aList

It’s just a pity that it’s not what Ed apparently wants. :wink:


(Shane Stanley) #14

Yep. OTOH, if he’s after what I suspect, it’s still pretty fast — I make it only about 2.5x as long as getting character counts. And that’s using quite a bit longer strings than above, to increase the work required. So I dare say it will fit comfortably in Ed’s 30-minute budget :ok_hand:


(Ed Stockly) #15

Are these in points or mm or something else?


(Shane Stanley) #16

They’re returned in points.