FastScripts 3.1: Streamlined Regular Expression Support

Hey folks! I’m sharing a pre-release of FastScripts 3.1, which is under development, because I know this crowd is among the most advanced when it comes to scripting functionality, and also because I know @ShaneStanley in particular has a lot of experience supporting Regex with his wonderful RegExAndStuff lib.

For FastScripts 3.1 I decided to tackle a long-standing personal wish, which is that FastScripts should itself support an interface for performing at least the most common Regex tasks. If anybody cares to give it an early try, I’d appreciate feedback you have, in case it helps to fill any obvious holes, or polish rough edges, before shipping the 3.1 release to the public.

You can download the beta here:

https://redsweater.com/fastscripts/FastScripts3.1b4.zip

The new functionality should hopefully pretty self-explanatory to regex aficionados. Take a look at the scripting dictionary in the FastScripts Scripting Additions suite for “regex search”, “regex replace” and “regex split”.

I’ll probably add something to support regex string escaping, and eventually I’ll add options to support flags like case sensitivity, etc., but I wanted to keep things relatively simple at least for starters, until I get a sense for how (or if) people choose to use it.

Daniel

3 Likes

A small request: you’re using the same name for some commands as my lib, but with quite different parameters and parameter names. It’s not a technical problem if people use tell blocks, but I think it’s a recipe for user confusion.

2 Likes

This looks great. I use the FastScripts display message command all the time, so happy to see more functionality being added.

regex search: In my quick and crude test (matching every 7-letter word from a 2k word document, 10 times), this is running significantly slower than RegexAndStuffLib’s equivalent—0.1s as opposed to 0.04s. This goes up to nearly 0.2s vs 0.04s if I use capture groups. I like that it returns the offset for each result though. That’s a really cool feature that I’d use a lot. Does it need to return ‘capture groups’ when no groups are specified (i.e. when that list will be empty)?

regex split: Also running about half as fast, performance-wise

regex replace: Ditto

Agreed with @ShaneStanley re the terminology. I’m getting no errors using both FastScripts and RegexAndStuffLib commands in the same script but it’s at the very least potentially confusing. Maybe something like ‘split by regex,’ ‘search with regex’ instead? It’s more verbose but, hey, this is AppleScript we’re dealing with!

If the two sets of commands end up having slightly different features, they would complement one another nicely.

1 Like

Thanks @ShaneStanley and @p1r2c1 for the feedback! I’ll look into the performance issues.

As for the naming discrepancy, I might need more convincing that there is an issue seeing as how the commands are, as noted, limited to “tell” blocks on FastScripts. But it’s definitely possible or even likely that my naming was inspired by RegExAndStuff so if I can think of a more distinct way of naming things that still feels nicely terse, I’ll switch it up.

Unless the scripter either includes a relevant FastScripts ‘use’ statement, or sets FastScripts as the default target for the script. Neither is a great idea, IMO, but I’m not entirely comfortable making life difficult for anyone who disagrees.

I definitely don’t want to make life difficult for anybody but I wonder if contorting to use a less direct name in FastScripts would be making life difficult for the probable vast majority of scripters who would choose one library or the other depending on circumstances.

I guess my concern is more that you, @ShaneStanley, will feel that your library’s naming conventions are being too closely mimicked. I’m less convinced that it will be a problem for typical scripters but if you are personally annoyed by the similarities I’m more inclined to change it.

Personally doesn’t really come into it – although if you matched the commands and supported the same parameters, I’d be delighted.

If you decide to stick with what you have, I’ll just change my terminology. I’m almost certain a mix will generate support requests I can live without.

Thanks, let me reflect on it a bit and think if there are any obvious other choices I could use to avoid the conflict. I ended up going with some different parameters in part because I wanted to keep things simpler in some ways. But I’m glad I shared this semi-publicly here to get the feedback before shipping the final release. It’ll definitely be easier to tweak now than later!

What if you added the letter “p” to each command? Like this:

“regexp search”, “regexp replace” and “regexp split”

Stan C,

It sure would make it unique, but I guess i’m a little fussy and like the scripting terms to be more easily pronouncable. Regex is just a nice abbreviation that rolls off the tongue, IMO!

I might actually be open to changing the names of the commands to remove “regex” and just allude to the goal without specifying that it’s regex. For example “search text”, “replace text” and “split text”. I suppose this might be nice too if there was ever any interest in supporting non-regex variations with the same interface.

I’m kind of a noob with Regex, but I got a very simple search to work…the only thing is I am having trouble figuring out how to reference the captured groups if you want to use those in your script later. Inserting “captured groups” into the statement just doesn’t seem to work - it won’t compile. Without ‘captured groups’ it compiles but errors. See below. I’m sure it is something simple, just eludes me at the moment. :man_shrugging:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

set myText to " how number extraction works and whether it can sneak a phone number out of a bunch of text like thisTHis is a test to see how number extraction works and whether it can sne text like thisTHis is a test to see how number extraction works and whether it can sneak a phone number out of a bunch of text like thisTHis is a test to see how number extraction works and whether it can sneak a phone number out of a bunch of text like thisTHis is a test to see how number extraction works and whether it can sneak a phone number out of a bunch 717-171-7272 of text like thisTHis is a test to see how number extraction works and whether it can sneak a phone number out of a bunch of text like thisTHis is a test to see how number extraction works and whether i"

tell application "FastScripts"
set myres to regex search myText with pattern "(\\d{3}).+?(\\d{3})([-|\\.|\\s])(\\d{4})"
end tell

display dialog text of myres

display dialog (text of item 3 of item 1 of myres)
-- Can’t get item 3 of {captured groups:{{offset:537, text:"717-171-7272"}, {offset:537, text:"717"}, {text:"171", offset:541}, {text:"-", offset:544}, {text:"7272", offset:545}}, text:"717-171-7272", offset:537}.

Thanks for giving it a try @vinnie-bob - I think maybe part of what you’re running into is that referencing the custom record stored in “myres” after the tell block means that the terminology is no longer recognized. I got your script to work by wrapping the whole thing in ‘using terms from app “FastScripts”’ and changing this line:

display dialog (text of item 3 of captured groups of myres)

So it references the captured groups component of the results. But I think you could also work around it by accessing the specific component wanted while still inside the FastScripts tell block.

1 Like

Thanks again @ShaneStanley and @p1r2c1 for taking a look at the original beta release. Taking your feedback to heart I’ve revised some thing for the 3.1b6 update:

https://redsweater.com/fastscripts/FastScripts3.1b6.zip

In this release I’ve moved FastScripts’s commands to their own suite within FastScripts, which unfortunately means the existing test scripts you might have made will need to have their FastScripts command names rewritten.

As part of the move I have decided to go with the idea of naming the commands without any allusion to regex in the command name itself. It’ll now be “search text”, “replace text” and “split text” which should make it completely unambiguous with respect to RegExAndStuff. Let me know if you think there are any other naming conflicts this brings up!

As for the performance issues, I found a couple obvious fixes that should help. Let me know how the updated version performs in the same tests! Hopefully it should be noticeably better, if not quite as good as RegExAndStuff. I think because of the cross-process bridge and AppleEvent transformations I might have to work a little harder to match the performance of an imported script library.

I think dropping the “regex” is a good solution. One other very minor thing: the with pattern parameter. Given the way AppleScript compiles boolean parameters into the with/without pattern, it might be preferable to use a term without the with. Again, it’s not a functional issue, and pretty minor in the scheme of things. (I’ve done it myself on occasion.)

Ah, good point! I’ll take this into consideration. Maybe “using” would be a better choice if I want to maintain some of the same linguistic flow but not conflict with the with/without terms?

I’ve used using but AS uses it when the dictionary is missing. What about something like matching pattern?

That sounds good. Thanks!

Just tested the new version. Getting similar results as previously.

E.g. using the following pattern on a 46k word .md file, repeated 10 times: \\W(\\w{7})\\W
RegexAndStuffLib = 1s
FastScripts = 2.5s

Obviously, in many practical instances, any performance difference will be trivial.

(FWIW I’m using the CACurrentMediaTime() method to time the script, which I copied from @ShaneStanley a while back. It runs more or less the same whether from Script Debugger or FastScripts.)

I don’t know if it’d help but here are some of the use cases that I, personally, have for regex in AS:

I just wrote a library for writing markdown in CotEditor (which I am in love with). E.g. emphasis, strong emphasis, block quotes, headings, etc., both adding and removing. I was able to do this mostly with vanilla AS but some things became too complicated so I used RegexAndStuffLib. Things like counting how many # characters are at the start of a line. Lots of edge cases where this is tricky in vanilla AS but very easy with regex (e.g. if the line is blank).

I’m also working on some scripts that allow easily jumping from one point in a text to another (and back) without touching the mouse. The current search text feature of returning the offset of each result could be very useful for this purpose. I’ll experiment with it.

I then use regex for simpler things like extracting a UUID from a string. Either library would probably work equally as well in such cases. Where I would probably prefer RegexAndStuffLib for performance reasons (as they currently stand) is when working with larger quantities of text. E.g. I have another script that turns every substring contained within «/» characters into a numbered (markdown) footnote.

If both libraries cover the basics and then have slightly different extra features, that’d be fine with me :slight_smile:

P.S. I think the search text naming is a good compromise.

After playing with this a bit, I have to say, it is a great idea. If you own Fastscripts, it is always sitting there ‘running’ so it is ready to process your regex. I have BBEdit, which is great, but if you write applescripts for that to use Regex, you have to wait for the app to launch plus the Applescripts involve a lot more escaping. Also I find the regex applescript terminology/grammar to be a little convoluted. This Fastscript Scripting Addition this is a nice ‘added value’ proposition for power users who might be interested in Fastscripts. One downside is that you can’t share those scripts with the general public, I guess you’d have to use Shane’s library if that’s what you want. But, thanks for adding this feature, Daniel!