I finally got around to organizing some regular expressions stuff into a script library. Giving it a terminology dictionary makes it more convenient because it allows for optional parameters. I added some other bits and bobs, including case-conversion and some encoding/decoding commands.
Thanks Shane. As always your Script Libs are greatly appreciated.
Is it possible in the regex search to return ALL Capture Groups?
From the Dictionary it appears you can return only one:
capture group optional integer Which capture group to return. Default is capture group 0 (complete match). Ignored if a ‘replacement template’ parameter is provided.
I almost always want to return ALL CGs, and it would seem inefficient to have to do a separate search for each one.
Also, what is the difference between regex change and regex search with replace template? Why would I use one over the other?
It gets complicated in terms of how to return the results when every capture group is not returned in every match. That’s rare, I suspect, but it needs to be handled nonetheless. Let me think about it…
The first returns the full string with changes made; the second returns just the replacement strings.
set someText to "This is an IP address: 192.168.5.128"
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"🔪"}
It looks like a long way around, but it may actually be a bit quicker than returning all capture groups in the first place. Searches in ASObjC are relatively slow because the methods return ranges, which then have to be used to create the strings.
Thanks for the example.
However, I am puzzled about one thing: use of Script Lib references.
Why doesn’t this work when I use the “StrLib” reference to the Library?
use StrLib : script "RegexAndStuffLib"
### THIS FAILS -- won't compile
set theResult to StrLib's regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
But this does work, even though I have NOT issued a general “use” command:
use StrLib : script "RegexAndStuffLib"
### THIS WORKS
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
for Testing, here’s my complete script:
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation" -- this may not be required
use framework "AppKit" -- this may not be required
use scripting additions
use StrLib : script "RegexAndStuffLib"
set someText to "This is an IP address: 192.168.5.128"
### THIS FAILS -- won't compile
### set theResult to StrLib's regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
### THIS COMPILES & Runs Fine
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"🔪"}
I have been using the form of “use” ilke this with my Script Library for several years: use RefName : script "LibraryName" set x to RefName's SomeHandler(p1, p2)
Shane, I have been using a RegEx handler that returns ALL Capture Groups, based on your scripts, for at least a year now, and have found it to be easy to use and very fast.
Here’s a comparison:
property ptyScriptName : "How to Process Multiple Capture Groups"
property ptyScriptVer : "1.0"
property ptyScriptDate : "2019-07-17"
property ptyScriptAuthor : "JMichaelTX"
(*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PURPOSE:
• How to Process Multiple Capture Groups
REF: The following were used in some way in the writing of this script.
1. 2019-07-17, ShaneStanley, Late Night Software Ltd.
RegexAndStuffLib Script Library
https://forum.latenightsw.com/t/regexandstufflib-script-library/2018/4?u=jmichaeltx
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*)
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation" -- this may not be required
use framework "AppKit" -- this may not be required
use scripting additions
use StrLib : script "RegexAndStuffLib"
set someText to "This is an IP address: 192.168.5.128"
set reFind to "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)"
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--» Using Shane's new RegexAndStuffLib Library
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
set theResult to regex search someText search pattern reFind replace template "$1🔪$2🔪$3🔪$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"🔪"}
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--» Using My RegEx Handler to Return ALL Capture Groups
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
set reMatches to my regexFind(reFind, someText, true)
-->{{"192.168.5.128", "192", "168", "5", "128"}}
--- see comments in below handler for description of returned list
### I have found this to be very fast.
--~~~~~~~~~~~~~~~~~~~~~~~ END OF MAIN SCRIPT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
on regexFind(pFindRegEx, pSourceString, pGlobalBool) -- @RegEx @Find @Search @Strings @ASObjC @Shane
--–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
(* VER: 1.2 2019-05-18 -- Check for No Matches
PURPOSE: Find Match(s) & Return Match with All Capture Groups as List of Lists
METHOD: Uses ASObjC RegEx (which is based on ICU Regex)
PARAMETERS:
• pFindRegEx | text | RegEx pattern to search for
• pSourceString | text | Source String to be searched
• pGlobalBool | bool | Set true for Global search (all matches)
RETURNS: IF pGlobalBool: List of Lists, one list per match
ELSE: Single List of first match
Each Match List is a List of Full Match + One Item per Capture Group
{<Full Match>, <CG1>, <CG2>, <CG3>, ...}
IF CG not found, Item is returned as empty string
If NO matches, return empty list {}
AUTHOR: JMichaelTX
## REQUIRES: use framework "Foundation"
REF:
1. 2017-11-22, ShaneStanley, Does SD6 Find RegEx Support Case Change?
• Late Night Software Ltd.,
• http://forum.latenightsw.com//t/does-sd6-find-regex-support-case-change/816/8
2. ICU RegEx Users Guide
http://userguide.icu-project.org/strings/regexp
--–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
*)
local theFinds, theFind, matchFound, theResult, subResult, groupCount
set LF to linefeed
try
set matchFound to false
set pSourceString to current application's NSString's stringWithString:pSourceString
set {theRegex, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pFindRegEx options:0 |error|:(reference)
if theRegex is missing value then error ("Invalid RegEx Pattern." & LF & theError's localizedDescription() as text)
if (pGlobalBool) then ### FIND ALL MATCHES ###
set theFinds to theRegex's matchesInString:pSourceString options:0 range:{0, pSourceString's |length|()}
if (theFinds is not missing value) then set matchFound to true
else ### FIND FRIST MATCH ###
set theFind to theRegex's firstMatchInString:pSourceString options:0 range:{0, pSourceString's |length|()}
if (theFind is not missing value) then
set theFinds to current application's NSMutableArray's array()
(theFinds's addObject:theFind)
set matchFound to true
else
set theFinds to {}
set theResult to {}
set matchFound to false
end if
end if
if matchFound then
set theResult to current application's NSMutableArray's array()
repeat with aFind in theFinds
set subResult to current application's NSMutableArray's array()
set groupCount to aFind's numberOfRanges()
repeat with i from 0 to (groupCount - 1)
set theRange to (aFind's rangeAtIndex:i)
if |length| of theRange = 0 then
--- Optional Capture Group was NOT Matched ---
(subResult's addObject:"")
else
--- Capture Group was Matched ---
(subResult's addObject:(pSourceString's substringWithRange:theRange))
end if
end repeat
(theResult's addObject:subResult)
end repeat -- theFinds
set matchList to theResult as list
else ### NO MATCH WAS FOUND ###
set matchList to {}
end if
on error errMsg number errNum
set errMsg to "ASObjC RegEx ERROR #" & errNum & LF & errMsg
set the clipboard to errMsg
display dialog errMsg & LF & ¬
"(error msg is on Clipboard)" with title (name of me) with icon stop
error errMsg
end try
return matchList
end regexFind
--~~~~~~~~~~~~~~~~~~~~ END of Handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
And that’s the way to work with a library that doesn’t have a terminology dictionary. But when one does, you no longer need the reference. AppleScript uses the terminology to work out the target itself.
OK, version 1.0.1 is now available. It introduces a separate command, regex search once, for searching for the first or last match only. Both it and the regex search command also allow multiple capture groups, or all capture groups, to be specified.
Many thanks Shane. This is really appreciated.
I hope it was not too much trouble.
It’s in the Dict, but for easy ref, here’s the syntax:
(pass empty list as parameter to Capture Groups)
set reMatchList to regex search sourceStr search pattern reFind capture groups {}
Returns a nice list of lists, with first item as full match, and each remaining item as a Capture Group:
I use RegEx and string handlers all the time, so I think this is going to be my favorite of ALL script libraries, and I really like that it has terminology so we don’t have to use the reference name when using.
Version 1.0.2 introduces performance improvements. In particular, the regex search command is faster — significantly so if you specify a replace template, or capture groups is either unused or a single integer.
Version 1.0.3 fixes bugs with changing the case of lists of strings in 1.0.2.
Version 1.0.4 fixes a bug in the regex search once command, where it returns an array rather than a list when multiple capture groups are specified.
Even if I’m late to the party … thank you SOOO much for sharing your work. It has saved my (work) life after the Apple induced demise of satimage regex functionality
Are there examples of scripts using the library? I can usually figure things out after I find an example. It points out where I’ve done something stupid and not followed the dictionary correctly.
I’m trying to replace my Satimage references with RegexAndStuff but I’m missing a few things and can’t figure out a few. In the below example, it’s not even a regex change, but just an attempt to change one string to another. Specifically, a change statement that changes more than one thing into something else in a single statement.
I thought the regex batch was the answer but I get an error when I use it, so I’m guessing that I haven’t understood the dictionary correctly. Here’s my sample code to test this feature:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use script "RegexAndStuffLib" version "1.0.7"
set StartText to "12345"
set ChangeList to {{"1", "6"}, {"2", "7"}, {"3", "8"}}
set SatimageEndText to change {"1", "2", "3"} into {"6", "7", "8"} in StartText
set EndText to regex batch StartText change pairs list of ChangeList
The Error I’m getting is: Can’t get list of {{“1”, “6”}, {“2”, “7”}, {“3”, “8”}}.
Also, I use the Satimage “sortlist” statement throughout a lot of my scripts. Is there a similar routine I can use to replace this function? Especially with the “remove duplicates” option. An ObjC call (might force me to actually learn to use ObjC for myself instead of just through the libraries)?
Thanks for all of your libraries. I’m slowly converting all my scripts over so I can move on past High Sierra without having to use Satimage as a separate application instead of an OSAX. But I’ve got a long way to go.
use AppleScript version "2.4" -- macOS 10.10 or later
use framework "Foundation"
return my killDuplicatesAndSort({"b", "b", "c", "d", "e", "b", "c", "f", "a", "2", "g", "g"})
on killDuplicatesAndSort(theList)
set theList to theList as list
set NSArray to (current application's NSArray's arrayWithArray:theList)'s sortedArrayUsingSelector:"compare:"
return (current application's NSOrderedSet's orderedSetWithArray:NSArray)'s array() as list
end killDuplicatesAndSort
Thanks, Shane. I knew I was doing something stupid. Like reading the dictionary too literally!!
Thanks, also, TMA. That looks too easy. None of the ObjC books I have even reference arrayWithArray or NSOrderedSet or orderedSetWithArray. Guess I’ve got a long way to go, yet.