Efficient way to reorganize lists

hhas01 · September 4, 2022, 7:07am

Is this what you’re wanting:

use script "List"

transpose list {{1, 2, 3}, {"A", "B", "C"}}
--> {{1, "A"}, {2, "B"}, {3, "C"}}

An AppleScript list is implemented as a vector array so accessing a list item should have O(1) efficiency (constant time), the same as a Python list or ObjC NSArray (both vectors too).

However, there is some dumb implementation in AS that means each lookup also involves incidentally iterating the entire list (IIRC to check for circular containment where a list value has been added to itself, but that’s not important) that brings that efficiency down to O(n) (linear time); no better than the linked list implementation AS originally had.

Thus, iterating the entire list, which should be O(n) efficiency, becomes O(n*n) (quadratic) and drops AS performance in the toilet as soon as lists become large. (And that’s even before we consider most ASers don’t know Computer Science, so will often write/use inefficient algorithms themselves, degrading performance even further.)

What the script object kludge does is tickle AS’s list item access implementation in such a way that it bypasses the pathological iteration, bringing efficiency back to O(1). (It’s actually possible to crash AS if you do this in a certain way, but that’s not really important either.)

This WONTFIX stuff is all very tired and all very boring, and none of it is healthy or useful knowledge for programmers or non-programmers. Just one more Stupid that hapless users must wade through cos some developer didn’t have their thinking brain on they day that they wrote it. And one more reason AS needs gracefully retired in favor of a new language that learns from its good aspects (AS looks attractive and “easy” to non-programmers) while learning from all its mistakes and not repeating them as well.

suzume · September 4, 2022, 9:32am

That’s exactly what I want. And indeed, I wrote a Python script that handled HUGE tables in mere seconds when even with BridgePlus it took orders of magnitude more time to deal with fractions of such tables.

Not that I need to handle such huge data sets, but I was shocked at Python’s efficiency.

(You replied there too:
Splitting processes between AS and other languages)

CJK · September 4, 2022, 9:36am

I always wondered what caused the quirk in behaviour around access times. The same speed benefit is conferred at the top-level by way of my, so I figured it might have something to do with whether or not the list being treated referentially, but you’re saying that the normal implementation involves evaluating the entire thing prematurely ?

I’ve gotten it to crash in certain situations where I declare the list as the script object’s parent. I think it tends to be fine if one only reads from it, but chokes sometimes when one tries to update its contents. It seems inconsistent (by which I mean, I haven’t yet discerned a pattern sufficient for me to predict confidently when it will or won’t balk).

hhas01 · September 4, 2022, 10:21am

The same speed benefit is conferred at the top-level by way of my

To be clear, “That Script Object Kludge” is just the common title for this “technique”. It’s the extra referencing that provides the speed-up:

item N of listVar of someObj -- O(1)

instead of:

item N of listVar -- O(n)

And since the reference can only point to a script object property, that means either a top-level property or a property of a script object that’s local to the handler. The latter is a couple more lines of code but avoids creating coupling between handler and global state, so from a software design perspective:

to foo()
  script kludgeObj
    property p : {...}
  end script
  ...
  -- do stuff to `p of kludgeObj` here
end foo

is preferable to:

property kludgeProp : {...}

to foo()
  ...
  -- do stuff to `kludgeProp of me` here
end foo

kludgeObj and me are both script objects; the difference is purely in the objects’ scopes. With the local script object, you know at a glance that only foo uses its property, not any other handler; whereas with a global script object you have to think about what other handlers might interact with that property, so it’s harder for humans to reason about. (The machine doesn’t care either way, of course.)

Like I say, AS is just ghastly crap all the way down, making what should be genuinely simple to learn look simple on the surface but an absolute obfuscated insanely complicated mess underneath. We use AS not cos it’s the best tool for the job but the de facto only tool available.†

–

† Not 100% true: Python 3 appscript, SwiftAutomation, nodeautomation are all fully capable of replacing AS from a technical standpoint, giving still-awful but less-awful-than-AS languages the same access to “AppleScriptable” apps that AS has always enjoyed. But there’s a reason I stopped providing [free] support for those alternatives years ago; so, as I say, AS is the only real option in practice unless you’re extremely brave/foolhardy/or have a support contract with me for whatever solution I built for you that uses it.

hhas01 · September 4, 2022, 10:37am

Indeed. Python is one of the slower languages too, just to rub it in. That’s the advantage years of hard optimization work on the Python language’s core gets you.

That said, an inefficient algorithm will begin to crawl whatever the language as the amount of data increases. e.g. A quicksort in AppleScript would outpace a bubblesort in Rust on large-enough arbitary† lists. However, because those other languages typically come with large numbers of expertly written and highly optimized standard libraries, they should always blast an AS script out of the water on a generic task such as sorting a list. Even the sort list in my old List library, which is pretty snappy by typical AS standards, is incredibly primitive and naive compared to the far more complex sort algorithm Python employs.

–

† Bubblesort is, conversely, very fast on one particular special case, almost-fully/completely sorted lists, close to O(n). While quicksort’s efficiency drops toward O(n*n) on pathological lists. But in the common case of a well-randomized list, where quicksort approaches O(n * log n) efficiency, QS wins out by a mile.

ShaneStanley · September 4, 2022, 12:03pm

Have you tried seeing what the error is?

set {myTMData, theError} to current application's SMSForder's colsToRowsIn:myRawData |error|:(reference)
if theResult = missing value then error (theError's localizedDescription() as text)

FWIW, it’s working fine here under the latest version of macOS.

suzume · September 5, 2022, 12:38pm

The more I think about it, the more it looks like something is wrong with my system.

I had a MBP mid-2015 in Big Sur that I could not upgrade to Monterey because of a Samsung SSD and I bought a “new” MBP mid-2015 with a 1TB Apple SSD disk instead that had Monterey pre-installed and I migrated my Time Capsule backed-up user, and then things started to not work as usual.

And I have no idea what’s going on. But it looks less and less like an AS issue.

CJK · September 7, 2022, 2:55am

Yes, that’s what I was getting at. And the rest follows on from this. But thanks for confirming. I like your level of detail.