RegexAndStuffLib Script Library

I finally got around to organizing some regular expressions stuff into a script library. Giving it a terminology dictionary makes it more convenient because it allows for optional parameters. I added some other bits and bobs, including case-conversion and some encoding/decoding commands.

Feedback welcome, as usual.

2 Likes

Thanks Shane. As always your Script Libs are greatly appreciated. :+1:

Is it possible in the regex search to return ALL Capture Groups?

From the Dictionary it appears you can return only one:

capture group optional integer Which capture group to return. Default is capture group 0 (complete match). Ignored if a ‘replacement template’ parameter is provided.

I almost always want to return ALL CGs, and it would seem inefficient to have to do a separate search for each one.

Also, what is the difference between regex change and regex search with replace template? Why would I use one over the other?

Thanks.

It gets complicated in terms of how to return the results when every capture group is not returned in every match. That’s rare, I suspect, but it needs to be handled nonetheless. Let me think about it…

The first returns the full string with changes made; the second returns just the replacement strings.

1 Like

Here’s an example:

set someText to "This is an IP address: 192.168.5.128"
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"🔪"}

It looks like a long way around, but it may actually be a bit quicker than returning all capture groups in the first place. Searches in ASObjC are relatively slow because the methods return ranges, which then have to be used to create the strings.

Thanks for the example.
However, I am puzzled about one thing: use of Script Lib references.

Why doesn’t this work when I use the “StrLib” reference to the Library?

use StrLib : script "RegexAndStuffLib"

### THIS FAILS -- won't compile
set theResult to StrLib's regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"

But this does work, even though I have NOT issued a general “use” command:

use StrLib : script "RegexAndStuffLib"

### THIS WORKS
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"

for Testing, here’s my complete script:

use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation" -- this may not be required
use framework "AppKit" -- this may not be required
use scripting additions

use StrLib : script "RegexAndStuffLib"


set someText to "This is an IP address: 192.168.5.128"

### THIS FAILS -- won't compile
###  set theResult to StrLib's regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"

### THIS COMPILES & Runs Fine
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1🔪$2🔪$3🔪$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"🔪"}

I have been using the form of “use” ilke this with my Script Library for several years:
use RefName : script "LibraryName"
set x to RefName's SomeHandler(p1, p2)

What am I missing?

Shane, I have been using a RegEx handler that returns ALL Capture Groups, based on your scripts, for at least a year now, and have found it to be easy to use and very fast.

Here’s a comparison:

property ptyScriptName : "How to Process Multiple Capture Groups"
property ptyScriptVer : "1.0"
property ptyScriptDate : "2019-07-17"
property ptyScriptAuthor : "JMichaelTX"

(*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PURPOSE:
  • How to Process Multiple Capture Groups
  
REF:  The following were used in some way in the writing of this script.

  1.  2019-07-17, ShaneStanley, Late Night Software Ltd.
      RegexAndStuffLib Script Library
      https://forum.latenightsw.com/t/regexandstufflib-script-library/2018/4?u=jmichaeltx

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*)
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation" -- this may not be required
use framework "AppKit" -- this may not be required
use scripting additions

use StrLib : script "RegexAndStuffLib"

set someText to "This is an IP address: 192.168.5.128"
set reFind to "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)"

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--»   Using Shane's new RegexAndStuffLib Library
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

set theResult to regex search someText search pattern reFind replace template "$1🔪$2🔪$3🔪$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"🔪"}

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--»   Using My RegEx Handler to Return ALL Capture Groups
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


set reMatches to my regexFind(reFind, someText, true)
-->{{"192.168.5.128", "192", "168", "5", "128"}}
--- see comments in below handler for description of returned list

### I have found this to be very fast.

--~~~~~~~~~~~~~~~~~~~~~~~ END OF MAIN SCRIPT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
on regexFind(pFindRegEx, pSourceString, pGlobalBool) -- @RegEx @Find @Search @Strings @ASObjC @Shane
  --–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  (*  VER: 1.2    2019-05-18    -- Check for No Matches
    PURPOSE:  Find Match(s) & Return Match with All Capture Groups as List of Lists
    METHOD:    Uses ASObjC RegEx (which is based on ICU Regex)
    PARAMETERS:
      • pFindRegEx    | text |  RegEx pattern to search for
      • pSourceString | text |  Source String to be searched
      • pGlobalBool   | bool |  Set true for Global search (all matches)

    RETURNS: IF pGlobalBool:  List of Lists, one list per match
                ELSE:  Single List of first match
                Each Match List is a List of Full Match + One Item per Capture Group
                  {<Full Match>, <CG1>, <CG2>, <CG3>, ...}
                  IF CG not found, Item is returned as empty string
                If NO matches, return empty list {}

    AUTHOR:  JMichaelTX
    ## REQUIRES:  use framework "Foundation"
    REF:  
      1. 2017-11-22, ShaneStanley, Does SD6 Find RegEx Support Case Change?
           • Late Night Software Ltd., 
          • http://forum.latenightsw.com//t/does-sd6-find-regex-support-case-change/816/8
      2.  ICU RegEx Users Guide
          http://userguide.icu-project.org/strings/regexp
    --–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  *)
  
  local theFinds, theFind, matchFound, theResult, subResult, groupCount
  set LF to linefeed
  
  try
    set matchFound to false
    
    set pSourceString to current application's NSString's stringWithString:pSourceString
    set {theRegex, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pFindRegEx options:0 |error|:(reference)
    if theRegex is missing value then error ("Invalid RegEx Pattern." & LF & theError's localizedDescription() as text)
    
    if (pGlobalBool) then ### FIND ALL MATCHES ###
      set theFinds to theRegex's matchesInString:pSourceString options:0 range:{0, pSourceString's |length|()}
      if (theFinds is not missing value) then set matchFound to true
      
    else ### FIND FRIST MATCH ###
      set theFind to theRegex's firstMatchInString:pSourceString options:0 range:{0, pSourceString's |length|()}
      if (theFind is not missing value) then
        set theFinds to current application's NSMutableArray's array()
        (theFinds's addObject:theFind)
        set matchFound to true
      else
        set theFinds to {}
        set theResult to {}
        set matchFound to false
      end if
    end if
    
    if matchFound then
      
      set theResult to current application's NSMutableArray's array()
      
      repeat with aFind in theFinds
        set subResult to current application's NSMutableArray's array()
        set groupCount to aFind's numberOfRanges()
        
        repeat with i from 0 to (groupCount - 1)
          
          set theRange to (aFind's rangeAtIndex:i)
          if |length| of theRange = 0 then
            --- Optional Capture Group was NOT Matched ---
            (subResult's addObject:"")
          else
            --- Capture Group was Matched ---
            (subResult's addObject:(pSourceString's substringWithRange:theRange))
          end if
        end repeat
        
        (theResult's addObject:subResult)
        
      end repeat -- theFinds
      
      set matchList to theResult as list
      
    else ### NO MATCH WAS FOUND ###
      
      set matchList to {}
      
    end if
    
  on error errMsg number errNum
    set errMsg to "ASObjC RegEx ERROR #" & errNum & LF & errMsg
    set the clipboard to errMsg
    display dialog errMsg & LF & ¬
      "(error msg is on Clipboard)" with title (name of me) with icon stop
    error errMsg
    
  end try
  
  return matchList
  
end regexFind
--~~~~~~~~~~~~~~~~~~~~ END of Handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And that’s the way to work with a library that doesn’t have a terminology dictionary. But when one does, you no longer need the reference. AppleScript uses the terminology to work out the target itself.

Keep using it. As I said, I’m thinking about it. But as a data point, that takes more than twice as long as my (admittedly contrived) example.

It’s largely an issue of how to fit it in with the other parameters, and not ending up with a confusing dictionary or spaghetti code.

OK, version 1.0.1 is now available. It introduces a separate command, regex search once, for searching for the first or last match only. Both it and the regex search command also allow multiple capture groups, or all capture groups, to be specified.

2 Likes

Many thanks Shane. This is really appreciated. :+1:
I hope it was not too much trouble.

It’s in the Dict, but for easy ref, here’s the syntax:
(pass empty list as parameter to Capture Groups)

set reMatchList to regex search sourceStr search pattern reFind capture groups {}

Returns a nice list of lists, with first item as full match, and each remaining item as a Capture Group:

image

I use RegEx and string handlers all the time, so I think this is going to be my favorite of ALL script libraries, and I really like that it has terminology so we don’t have to use the reference name when using.

Version 1.0.2 1.0.3 1.0.4 is now available.

Version 1.0.2 introduces performance improvements. In particular, the regex search command is faster — significantly so if you specify a replace template, or capture groups is either unused or a single integer.
Version 1.0.3 fixes bugs with changing the case of lists of strings in 1.0.2.
Version 1.0.4 fixes a bug in the regex search once command, where it returns an array rather than a list when multiple capture groups are specified.

2 Likes

@ShaneStanley, would you mind updating the URL to Freeware | Late Night Software ?

I was searching for regex on the forum and the original link is dead.

1 Like

Done — thanks for noticing.

2 Likes

Even if I’m late to the party … thank you SOOO much for sharing your work. It has saved my (work) life after the Apple induced demise of satimage regex functionality

1 Like

Shane,

Are there examples of scripts using the library? I can usually figure things out after I find an example. It points out where I’ve done something stupid and not followed the dictionary correctly. :grin:

I’m trying to replace my Satimage references with RegexAndStuff but I’m missing a few things and can’t figure out a few. In the below example, it’s not even a regex change, but just an attempt to change one string to another. Specifically, a change statement that changes more than one thing into something else in a single statement.

I thought the regex batch was the answer but I get an error when I use it, so I’m guessing that I haven’t understood the dictionary correctly. Here’s my sample code to test this feature:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use script "RegexAndStuffLib" version "1.0.7"

set StartText to "12345"
set ChangeList to {{"1", "6"}, {"2", "7"}, {"3", "8"}}

set SatimageEndText to change {"1", "2", "3"} into {"6", "7", "8"} in StartText

set EndText to regex batch StartText change pairs list of ChangeList

The Error I’m getting is: Can’t get list of {{“1”, “6”}, {“2”, “7”}, {“3”, “8”}}.

Also, I use the Satimage “sortlist” statement throughout a lot of my scripts. Is there a similar routine I can use to replace this function? Especially with the “remove duplicates” option. An ObjC call (might force me to actually learn to use ObjC for myself instead of just through the libraries)?

Thanks for all of your libraries. I’m slowly converting all my scripts over so I can move on past High Sierra without having to use Satimage as a separate application instead of an OSAX. But I’ve got a long way to go.

Jim Brandt

That should be:

set EndText to regex batch StartText change pairs ChangeList

Do a search. You should find plenty of examples here, or over at macscripter.net.

Maybe this helps:

use AppleScript version "2.4" -- macOS 10.10 or later
use framework "Foundation"

return my killDuplicatesAndSort({"b", "b", "c", "d", "e", "b", "c", "f", "a", "2", "g", "g"})

on killDuplicatesAndSort(theList)
	set theList to theList as list
	set NSArray to (current application's NSArray's arrayWithArray:theList)'s sortedArrayUsingSelector:"compare:"
	return (current application's NSOrderedSet's orderedSetWithArray:NSArray)'s array() as list
end killDuplicatesAndSort
1 Like

Thanks, Shane. I knew I was doing something stupid. Like reading the dictionary too literally!!

Thanks, also, TMA. That looks too easy. None of the ObjC books I have even reference arrayWithArray or NSOrderedSet or orderedSetWithArray. Guess I’ve got a long way to go, yet.

Helps me! It just replaced a longer, slower version in my clippings/libary. Thanks!

When this sort of thing happens, remember you can just drag the command from the dictionary window into a script window to get a compile-able version.

I can think of at least one ASObjC book that mentions them all :wink: