Efficient line processing


(Jean Christophe Helary) #1

If I were to have a few thousand lines of tab separated values that I’d want to have nicely packed in lists, would there be a way to do that in ASObj-C ? I find that the “standard” way of looping over the data set to put each line in a different list after handling the tab as the delimiter is a bit slow…


(Shane Stanley) #2

There’s no single method, other than in my BridgePlus framework/library:

https://www.macosxautomation.com/applescript/apps/BridgePlus_Manual/Pages/arrayFromTSV_.html


(Jean Christophe Helary) #3

I’m putting my test data in the script because it’s easier. I’d rather change this code to a few lines of ASObjC than to use an external library because it’s just a few lines to replace and because I’d learn more ASObjC :slight_smile: So here is my attempt at doing this thing:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set myString to "20180715~160806	M	0	EN-US	digital communities	PT-BR	comunidades digitais
"

set myNSString to current application's NSString's stringWithString:myString
set myNSList to myNSString's componentsSeparatedByString:"
"
repeat with i from 1 to count of myNSList
	try
		set myNSItem to ((item i of myNSList)'s componentsSeparatedByString:"	")
		set item i of myNSList to (current application's NSArray's arrayWithObjects:{(myNSItem's objectAtIndex:4), (myNSItem's objectAtIndex:6)})
	on error
		log
	end try
end repeat

My problem here is that I get this:

(NSArray) {
	{
		{
			"digital communities",
			"comunidades digitais"
		}
	},
	""
}

Instead of this:

(NSArray) {
		{
			"digital communities",
			"comunidades digitais"
		},
	""
}

and I don’t understand why I get that extra level of {}… (I do get the “” part though, I need to fine tune that part but that’s not an issue.)


(Shane Stanley) #4

There are a few issues there — in particular, arrayWithObjects: doesn’t take a list in the strict sense, but rather a list in the sense of what you normally pass to a handler.

Here’s a more typical approach:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set myString to "20180715~160806	M	0	EN-US	digital communities	PT-BR	comunidades digitais
"

set myNSString to current application's NSString's stringWithString:myString
set myNSList to myNSString's componentsSeparatedByCharactersInSet:(current application's NSCharacterSet's newlineCharacterSet())

set newList to current application's NSMutableArray's array()
repeat with aLine in myNSList
		set myNSItem to (aLine's componentsSeparatedByString:tab)
		(newList's addObject:{(myNSItem's objectAtIndex:4), (myNSItem's objectAtIndex:6)})
end repeat

But I suspect traditional AS will be quicker.


(Jean Christophe Helary) #5

I actually tried with traditional AS by first getting “every text item” with a line break separator and then looping over the list and further splitting with tabs and it took me ages to process a 4000 lines data set while this one took just a few seconds.

But my “traditional” AS may not be the most effectively coded though. Would you mind showing what you think could be faster ?


(Ed Stockly) #6

I’m following this conversation too, as I’m dealing with lists that now have over 1000 entries and the user has to sit and wait for my script to finish. That was no problem when there were a few hundred items).

I’m also investigating using script objects for all the list manipulation. That seems to speed things up on large lists.

Also, Has has a “list” script library that has some useful commands.


(Jean Christophe Helary) #7

Thank you for the code. The “addObject” is what I was looking for…

I compared both scripts on a 3500 lines / 160,000 words / 7 fields data set and I consistently find that mine is slightly faster than yours with 1.33s vs 1.50s.

It doesn’t really matter, either are fast enough for me :slight_smile: but I’m wondering if you have an idea why that would be ?

There are a few issues there — in particular, arrayWithObjects: doesn’t take a list in the strict sense, but rather a list in the sense of what you normally pass to a handler.

I’m sorry but I don’t understand what you mean.


(Shane Stanley) #8

Consider this:

on doSomething(arg1, arg2, arg3)

The arguments aren’t actually a list in the AppleScript sense. The arrayWithObjects: method actually takes a nil-terminated C-style “list”.


(Jean Christophe Helary) #9

Ok, I see, but the thing is that the {} were added by SD, not by me.


(Shane Stanley) #10

Right. To use it in AS you need to do something like arrayWithObjects_(item1, item2, missing value).