Replacing items in a list

estockly · May 11, 2022, 8:05pm

I’m looking for best practices here.

I have list of lists that includes BookISBN number, titles, authors, publishers, descriptions.

I have separate lists lists that have pairs like this:

 {rawTitle, replacementTitle} 
 {rawAuthor, replacementAuthor}
 {rawPublisher, replacementPublisher}

So what I’m looking for is the fastest, most reliable way to go through the book list and replace the titles, authors and publishers that need to be replaced.

Currently I’m doing it via repeat loops and as the lists get longer it’s taking too long.

What are my appleScript options, what would be the best practice for getting this done?

peavine · May 12, 2022, 4:49pm

Ed. I understand that you want to update the book list, but I don’t understand what test you are going to employ to decide what to replace with what. However, when working with large lists, script objects are normally very fast. The following is just an example–you should insert your own script.

set bookList to {{"A100", "Everyday AppleScriptObj", "Shane Stanley", "Myriad Communitcations Ply Ltd", "The definitive work"}, {"A200", "Prescott Revealed", "Peavine", "Peavine Publishing", "A Must Visit"}}
set bookListEdits to {{"A100", "Everyday AppleScriptObj Revised", "", "Stanley Publishiing"}, {"A200", "", "Peavine Peabody", ""}}

set updatedBookList to updateBookList(bookList, bookListEdits)
updatedBookList --> {{"A100", "Everyday AppleScriptObj Revised", "Shane Stanley", "Stanley Publishiing", "The definitive work"}, {"A200", "Prescott Revealed", "Peavine Peabody", "Peavine Publishing", "A Must Visit"}}

on updateBookList(aBookList, aBookListEdits)
	script o
		property bookList : aBookList
		property bookListEdits : aBookListEdits
	end script
	
	repeat with i from 1 to (count o's bookListEdits)
		repeat with j from 1 to (count o's bookList)
			if item 1 of item i of o's bookListEdits = item 1 of item j of o's bookList then
				repeat with k from 2 to 4
					if item k of item i of o's bookListEdits > "" then set item k of item j of o's bookList to item k of item i of o's bookListEdits
				end repeat
				exit repeat
			end if
		end repeat
	end repeat
	
	return o's bookList
end updateBookList

hhas01 · May 12, 2022, 8:23pm

I assume these lists came from somewhere else originally. In which case, best practice would be to correct/update the data in the database, or whatever your single source of truth is.

If there is some legitimate reason for applying the changes directly to these lists then, as peavine says, you need to use a script object kludge so you can iterate AppleScript lists in linear O(n), not quadratic O(n*n) time†. Or, if it’s simple data compatible with ASOC, put it in a NSArray/NSMutableArray which doesn’t have the performance problem native lists have. Also, put the raw/replacement lookup table in an NSDictionary, so each lookup is more or less constant O(1) time.

Honestly, AppleScript is a poor choice for any non-trivial data processing: slow, underpowered, and painfully inefficient. It really is the wrong tool for the job. Consider installing Node.js and learning some JavaScript. The language is a sloppy mess, but it’s fast and has excellent library and tool support. It’s also what Adobe is using to build its new UXP scripting support so if you’re an Adobe CC automator it’s a good future investment.

–

† Google “big-O notation” if you’re not familiar with it. Knowing the difference between speed and efficiency is key to crafting code that doesn’t bog down as the amount of data increases.

peavine · May 14, 2022, 12:57am

An ASObjC suggestion.

use framework "Foundation"
use scripting additions

-- Create an array of dictionaries from a list of lists. 
set bookList to {{"A100", "Everyday AppleScriptObj", "Shane Stanley", "Myriad Communitcations Ply Ltd", "The definitive work"}, {"A200", "Prescott Revealed", "Peavine", "Peavine Publishing", "A Must Visit"}}
set theKeys to {"ISBN", "Title", "Author", "Publisher", "Comments"}
set bookArray to current application's NSMutableArray's new()
repeat with aList in bookList
	set aDictionary to (current application's NSMutableDictionary's dictionaryWithObjects:aList forKeys:theKeys)
	(bookArray's addObject:aDictionary)
end repeat

-- Add a dictionary
set newList to {"A300", "The Principals of Government", "John Adams", "Peavine Publishing", "The views of a Founding Father"}
set theKeys to {"ISBN", "Title", "Author", "Publisher", "Comments"}
set newDictionary to (current application's NSMutableDictionary's dictionaryWithObjects:(newList) forKeys:theKeys)
(bookArray's addObject:newDictionary)

-- Modfy a dictionary identified by key value (ISBN in this case)
set searchKey to "ISBN"
set searchValue to "A200"
set revisedData to {Author:"Peavine Peabody"}
repeat with aDictionary in bookArray
	if ((aDictionary's valueForKey:searchKey)'s isEqual:searchValue) as boolean is true then
		(aDictionary's addEntriesFromDictionary:revisedData)
		exit repeat
	end if
end repeat

-- Get a dictionary identified by key value (Publisher in this case)
set searchKey to "Publisher"
set searchValue to "Peavine Publishing"
set thePredicate to current application's NSPredicate's predicateWithFormat:"%K == %@" argumentArray:{searchKey, searchValue} -- from Shane
set bookArrayMatches to (bookArray's filteredArrayUsingPredicate:thePredicate) as list

hhas01 · May 14, 2022, 9:47am

As a matter of best practice, bookList should really be structured as a list of records, not a list of lists; although without knowing where this data is coming from it’s all speculation anyway.

It may be a list of lists is inevitable as input, e.g. it was pulled from a tab-delimited plain text file using quick-n-dirty TIDs (ugh), in which case I wouldn’t waste time restructuring it in AppleScript. However, I would define the indexes of each field as named constants:

property ISBN_FIELD : 1
property TITLE_FIELD : 2
property AUTHOR_FIELD : 3
…

and always use those when referring to a particular field of a book record:

set theAuthor to item AUTHOR_FIELD of bookRecord

instead of spreading “magic numbers” throughout the code:

set theAuthor to item 3 of bookRecord -- unclear

–

As for updating each book record’s fields, this is where using dictionaries as lookup tables makes a big difference, because looking up a key in a list of key-value pairs is O(n) linear time whereas looking up a key in a dictionary is O(1) constant time. Again, without knowing where this data is coming from or what format it’s in we can only speculate, so for demonstration purposes we’ll just mock it:

set replacementAuthors to current application's NSMutableDictionary's dictionary()
replacementAuthors's setObject:"Peavine Peabody" forKey:"Peavine"
…

While we could stuff the book list into an NSMutableArray (the book records will convert to NSDictionary automatically) and iterate that, this probably won’t give us a speed advantage over the script object kludge. Either approach should be O(n) efficiency† though, and it’s the efficiency of an algorithm that really determines speed as the number of items increases.

At any rate, the optimum algorithmic efficiency of the update script should be O(n), where n is the number of items in bookList. This is in contrast to the naive unoptimized AppleScript, which is O(n² * m²), where m is the number of items in the key-value lookup list (i.e. performance quickly goes in the toilet as number of items increases).

We can check the script’s efficiency by using a timing command to compare its running times as n increases. If each 2× increase in n yields approximately a 2× increase in running time, we know the algorithm has O(n) efficiency, which is the best that can be achieved using these particular data structures‡. Whereas if the running time increases 4× (or worse), we know the algorithm is sub-optimal.

use framework "Foundation"
use scripting additions
use script "Objects" -- `timer object` (https://github.com/hhas/applescript-stdlib)

to updateBookList(bookList, replacementAuthors)
	script o -- ugly kludge to work around AS lists’ lousy efficiency
		property _items : bookList
	end script
	--set u to 0
	repeat with i from 1 to o's _items's length
		set bookRecord to o's _items's item i
		set newAuthor to (replacementAuthors's objectForKey:(bookRecord's Author)) as any
		if newAuthor is not missing value then
			--set u to u + 1
			set bookRecord's Author to newAuthor
		end if
	end repeat
	--log {"number of records updated:", u}
end updateBookList


-- mock book data
property _bookList : {¬
	{ISBN:"A100", Title:"Everyday AppleScriptObj", Author:"Shane Stanley", Publisher:"Myriad Communications Ply Ltd", Comments:"The definitive work"}, ¬
	{ISBN:"A200", Title:"Prescott Revealed", Author:"P. Peabody", Publisher:"Peavine Publishing", Comments:"A Must Visit"}}

-- mock substitutions
set replacementAuthors to current application's NSMutableDictionary's dictionary()
replacementAuthors's setObject:"Peavine Peabody" forKey:"P. Peabody"



-- TEST PERFORMANCE

to makeTestList(aList, n) -- generate a large list of unique records (this is quite slow)
	copy aList to listCopy
	repeat n times
		copy {listCopy, listCopy} to {a, b}
		set listCopy to a & b
	end repeat
	return listCopy
end makeTestList


repeat with n from 8 to 14
	set bookList to makeTestList(_bookList, n)
	set t to (timer object)'s startTimer()
	updateBookList(bookList, replacementAuthors)
	log {bookList's length, t's stopTimer()}
end repeat

-- RESULTS: (*size of bookList, time in seconds*)
(*256, 0.013669013977*)
(*512, 0.024824976921*)
(*1024, 0.049461007118*)
(*2048, 0.083796024323*)
(*4096, 0.166869044304*)
(*8192, 0.345046043396*)
(*16384, 0.661942958832*)
(*32768, 1.322294950485*)
-- efficiency = O(n), yay!

return bookList's items 1 thru 2
(*
      {ISBN:"A100", Title:"Everyday AppleScriptObj", Author:"Shane Stanley", Publisher:"Myriad Communications Ply Ltd", Comments:"The definitive work"}, 
      {ISBN:"A200", Title:"Prescott Revealed", Author:"Peavine Peabody", Publisher:"Peavine Publishing", Comments:"A Must Visit"} -- updated Author
*)

† It might be a bit less than that in practice, depending on how the dictionaries are implemented internally (e.g. if it’s a balanced B-tree, lookups will be O(log m); if it’s a naive fixed-size hash table then it will start to degrade from O(1) towards O(n) if the number of items significantly exceeds the number of “buckets” in the table). But unless the dictionaries contain a lot of data, then we can ignore the minor variances, as what overwhelmingly decides overall efficiency is the cost of iterating the [large] book list, which (ignoring AppleScript’s own internal pathologies) should be O(n).

.

‡ It is possible to achieve better efficiency, e.g. if the book list is held in a SQL database with fully normalized tables and indexed fields. Updating an author’s name there would be an O(1) constant-time operation since authors are stored in their own pre-indexed table which is joined to the book table by a many-to-many relationship; no repeated data and fast lookups.

(Though the main reason for using a real relational database is to ensure data integrity and a single source of truth; any performance gains are just a bonus.)

As pro automator, outside of application scripting most of your problems/needs you have are already solved by people much smarter and more knowlegeable than us; there’s no need to reinvent all those wheels [amateurly, badly, from scratch]. This is why it is a good idea to familiarize yourself with some basic CS concepts. You don’t have to be a programming expert; you just need to have some idea of what you don’t [yet] know, so when a problem/need arises, you know the right questions to ask yourself so you can go find the right answers.

A high school-level CS textbook is an excellent investment as your scripts become non-trivial. And, once you reach the limits of that text, a copy of McConnell’s Code Complete (ideal for hunt-n-peck learning; grasping coupling and cohesion alone will make you a better programmer than a lot of college-educated turnips). Yes there’s a learning curve associated, but you will save yourself a lot of time over the long run. That’s real Best Practice, regardless of the programming language you use or the problem domain you’re in.

estockly · May 14, 2022, 1:51pm

The data is provided in an excel spreadsheet by a booksellers organization and lists the top ten best sellers in 6 different categories each week.

We edit the titles, authors and publishers as needed to match our editorial style, and we write a brief description for each before inputting them into our system. The list itself isn’t large but the replacement lists get long over time, and that’s what’s slowing things down.

I’m thinking an SQL db might be the best approach.

Thanks everyone for the advice, this has been very helpful.

peavine · May 14, 2022, 2:34pm

Ed. I’m glad you’ve arrived at a good solution. This is outside my area of expertise, but it sure sounds like you need something more than a simple AppleScript.

Just to satisfy my personal curiosity, I decided to run some timing tests with my two suggestions. I used a timing script from elsewhere in this forum, and, for the basic AppleScript script, created a list of lists with 2,000 and 10,000 sublists. For my ASObjC suggestion, I included the conversion of the list of lists to an array of dictionaries in untimed code, since this would presumably only need to be created once. The timing results were:

With 2,000 sublists

Basic AppleScript with Script Object - 5 milliseconds
ASObjC Script - 155 milliseconds

With 10,000 sublists

Basic AppleScript with Script Object - 19 milliseconds
ASObjC Script - 790 milliseconds

hhas01 · May 15, 2022, 9:48am

“The data is provided in an excel spreadsheet by a booksellers organization … each week”

My sympathies. Data is king, and ensuring the integrity and availability of that data ought to be everyone’s #1 priority. Alas, there’s a lot of companies whose standard practice is to put an Excel file on a shared server that employees edit directly, and email copies of that file to suppliers. No data integrity, inefficient, lots of opportunities for human error; just a lousy ad-hoc process. But it’s what they’re used to and feel comfortable with. Ugh.

Still, it would be worth asking them if their book data in stored in a SQL database. Such systems often export to XLSX on the assumption that the data is going to a human user, but may also be able to export to automation-friendly XML/JSON if you ask for it.*

…

Assuming weekly emailed Excel files are what you’re stuck with, I doubt there’s a benefit to building a SQL database to hold it yourself, since you’d just be throwing out all the previous data and refilling it each week†.

You could consider doing all the data cleanup in Excel so the customer-supplied XLSX file remains your single “truth copy”. Your own organization can keep its style guide substitutions in a separate Excel file, which is easy for a human user to work with. Whether you can apply your style guide substitutions to the customer’s XLSX using Excel macros or will need to use some VBA/JavaScript I cannot say; you’d need to ask an Excel expert‡. Either way, it should blast through it in no time and spit out finished data for your AppleScript automation to drop straight into your artwork; no extra processing required.

Not the perfect system, but pragmatic. Separates the data preparation step from the artwork production step, with editorial style guide managed separately too, so you’ve a nice clean paper trail of which system and user did what. Speed is nice, but when architecting a system I always think of its responsibilities and liabilities first; especially when mixing manual steps with automated steps.

–

XLSX is a bad data interchange format for anything other than genuine spreadsheets, not least because of Excel’s habit of “helpfully” reformatting/rewriting the inputted data. (I’ve been caught out myself with fun stuff like ISBNs and numeric reference codes mysteriously losing their leading zeroes because Excel has treated them as real numbers. Worst automation data format except for raw PDF.)

† The exception would be if the customer currently uses ad-hoc Excel files but would like to move their business to a more robust database solution. But that’s a much bigger discussion.

‡ This assumes there’s a business case for improving the overall workflow, as opposed to doing the minimum work needed to make your existing AS automation go faster. (It is easy in automation to disappear up the proverbial; personally satisfying but negative ROI.) You did ask for “best practice” advice though, so I’d be remiss not to suggest general process improvements too. (It’s also easy to get hung up on details while missing big picture.)

estockly · May 15, 2022, 6:36pm

That’s close to what we’re doing now. And the data itself is very reliable and accurate, it just doesn’t meet our style for publication.

It’s just slowing down as the number of titles and changes build up.

hhas01 · May 16, 2022, 8:26am

My Excel-fu is weak, but this is probably what you want:

I would create an Excel file containing a couple of worksheets: one containing your lookup tables and another sheet containing macros which pull the data from the customer’s file and apply the substitutions automatically. You can then export that sheet as tab-delimited text for your AppleScript to place into the artwork.

estockly · May 18, 2022, 4:45pm

OK, this is sending me back down a rabbit hole I looked at before.

With this specific data, this wouldn’t help, but I’m getting more and more info from various sources and I’m either web scraping, or sending users to a URL in the browser and they’re copy/pasting the info.

But, from the looks of it, if I could access from AppleScript (or JavaScript from inside an appleScript; or AppleScriptObjC; or shell scripting) public APIs via this platform (or any other) that would be supercool.

https://www.postman.com

kerflooey · May 18, 2022, 9:25pm

I don’t know if its appropriate in this case, but I do a lot of XLS → Filemaker database (FM imports Excel spreadsheets) → Filemaker calculations to parse/auto-reformat to pub styles → Applescript for output to InDesign.

FM and Applescript work very well together. FM also has its own simple native scripting system, and can display font styles/colors/sizes/etc as well as graphics to reproduce and preview the target InDesign output. It can access or import most online data sources directly, and has a “web viewer” to access and display web pages (among other things). JSON and Javascript are also now supported.