As a matter of best practice, bookList
should really be structured as a list of records, not a list of lists; although without knowing where this data is coming from it’s all speculation anyway.
It may be a list of lists is inevitable as input, e.g. it was pulled from a tab-delimited plain text file using quick-n-dirty TIDs (ugh), in which case I wouldn’t waste time restructuring it in AppleScript. However, I would define the indexes of each field as named constants:
property ISBN_FIELD : 1
property TITLE_FIELD : 2
property AUTHOR_FIELD : 3
…
and always use those when referring to a particular field of a book record:
set theAuthor to item AUTHOR_FIELD of bookRecord
instead of spreading “magic numbers” throughout the code:
set theAuthor to item 3 of bookRecord -- unclear
–
As for updating each book record’s fields, this is where using dictionaries as lookup tables makes a big difference, because looking up a key in a list of key-value pairs is O(n)
linear time whereas looking up a key in a dictionary is O(1)
constant time. Again, without knowing where this data is coming from or what format it’s in we can only speculate, so for demonstration purposes we’ll just mock it:
set replacementAuthors to current application's NSMutableDictionary's dictionary()
replacementAuthors's setObject:"Peavine Peabody" forKey:"Peavine"
…
While we could stuff the book list into an NSMutableArray
(the book records will convert to NSDictionary
automatically) and iterate that, this probably won’t give us a speed advantage over the script object kludge. Either approach should be O(n)
efficiency† though, and it’s the efficiency of an algorithm that really determines speed as the number of items increases.
At any rate, the optimum algorithmic efficiency of the update script should be O(n)
, where n
is the number of items in bookList
. This is in contrast to the naive unoptimized AppleScript, which is O(n² * m²
), where m
is the number of items in the key-value lookup list (i.e. performance quickly goes in the toilet as number of items increases).
We can check the script’s efficiency by using a timing command to compare its running times as n
increases. If each 2× increase in n
yields approximately a 2× increase in running time, we know the algorithm has O(n)
efficiency, which is the best that can be achieved using these particular data structures‡. Whereas if the running time increases 4× (or worse), we know the algorithm is sub-optimal.
use framework "Foundation"
use scripting additions
use script "Objects" -- `timer object` (https://github.com/hhas/applescript-stdlib)
to updateBookList(bookList, replacementAuthors)
script o -- ugly kludge to work around AS lists’ lousy efficiency
property _items : bookList
end script
--set u to 0
repeat with i from 1 to o's _items's length
set bookRecord to o's _items's item i
set newAuthor to (replacementAuthors's objectForKey:(bookRecord's Author)) as any
if newAuthor is not missing value then
--set u to u + 1
set bookRecord's Author to newAuthor
end if
end repeat
--log {"number of records updated:", u}
end updateBookList
-- mock book data
property _bookList : {¬
{ISBN:"A100", Title:"Everyday AppleScriptObj", Author:"Shane Stanley", Publisher:"Myriad Communications Ply Ltd", Comments:"The definitive work"}, ¬
{ISBN:"A200", Title:"Prescott Revealed", Author:"P. Peabody", Publisher:"Peavine Publishing", Comments:"A Must Visit"}}
-- mock substitutions
set replacementAuthors to current application's NSMutableDictionary's dictionary()
replacementAuthors's setObject:"Peavine Peabody" forKey:"P. Peabody"
-- TEST PERFORMANCE
to makeTestList(aList, n) -- generate a large list of unique records (this is quite slow)
copy aList to listCopy
repeat n times
copy {listCopy, listCopy} to {a, b}
set listCopy to a & b
end repeat
return listCopy
end makeTestList
repeat with n from 8 to 14
set bookList to makeTestList(_bookList, n)
set t to (timer object)'s startTimer()
updateBookList(bookList, replacementAuthors)
log {bookList's length, t's stopTimer()}
end repeat
-- RESULTS: (*size of bookList, time in seconds*)
(*256, 0.013669013977*)
(*512, 0.024824976921*)
(*1024, 0.049461007118*)
(*2048, 0.083796024323*)
(*4096, 0.166869044304*)
(*8192, 0.345046043396*)
(*16384, 0.661942958832*)
(*32768, 1.322294950485*)
-- efficiency = O(n), yay!
return bookList's items 1 thru 2
(*
{ISBN:"A100", Title:"Everyday AppleScriptObj", Author:"Shane Stanley", Publisher:"Myriad Communications Ply Ltd", Comments:"The definitive work"},
{ISBN:"A200", Title:"Prescott Revealed", Author:"Peavine Peabody", Publisher:"Peavine Publishing", Comments:"A Must Visit"} -- updated Author
*)
† It might be a bit less than that in practice, depending on how the dictionaries are implemented internally (e.g. if it’s a balanced B-tree, lookups will be O(log m)
; if it’s a naive fixed-size hash table then it will start to degrade from O(1)
towards O(n)
if the number of items significantly exceeds the number of “buckets” in the table). But unless the dictionaries contain a lot of data, then we can ignore the minor variances, as what overwhelmingly decides overall efficiency is the cost of iterating the [large] book list, which (ignoring AppleScript’s own internal pathologies) should be O(n)
.
.
‡ It is possible to achieve better efficiency, e.g. if the book list is held in a SQL database with fully normalized tables and indexed fields. Updating an author’s name there would be an O(1)
constant-time operation since authors are stored in their own pre-indexed table which is joined to the book table by a many-to-many relationship; no repeated data and fast lookups.
(Though the main reason for using a real relational database is to ensure data integrity and a single source of truth; any performance gains are just a bonus.)
As pro automator, outside of application scripting most of your problems/needs you have are already solved by people much smarter and more knowlegeable than us; there’s no need to reinvent all those wheels [amateurly, badly, from scratch]. This is why it is a good idea to familiarize yourself with some basic CS concepts. You don’t have to be a programming expert; you just need to have some idea of what you don’t [yet] know, so when a problem/need arises, you know the right questions to ask yourself so you can go find the right answers.
A high school-level CS textbook is an excellent investment as your scripts become non-trivial. And, once you reach the limits of that text, a copy of McConnell’s Code Complete (ideal for hunt-n-peck learning; grasping coupling and cohesion alone will make you a better programmer than a lot of college-educated turnips). Yes there’s a learning curve associated, but you will save yourself a lot of time over the long run. That’s real Best Practice, regardless of the programming language you use or the problem domain you’re in.