RegexAndStuffLib Script Library

I finally got around to organizing some regular expressions stuff into a script library. Giving it a terminology dictionary makes it more convenient because it allows for optional parameters. I added some other bits and bobs, including case-conversion and some encoding/decoding commands.

https://www.macosxautomation.com/applescript/apps/Script_Libs.html

Feedback welcome, as usual.

1 Like

Thanks Shane. As always your Script Libs are greatly appreciated. :+1:

Is it possible in the regex search to return ALL Capture Groups?

From the Dictionary it appears you can return only one:

capture group optional integer Which capture group to return. Default is capture group 0 (complete match). Ignored if a β€˜replacement template’ parameter is provided.

I almost always want to return ALL CGs, and it would seem inefficient to have to do a separate search for each one.

Also, what is the difference between regex change and regex search with replace template? Why would I use one over the other?

Thanks.

It gets complicated in terms of how to return the results when every capture group is not returned in every match. That’s rare, I suspect, but it needs to be handled nonetheless. Let me think about it…

The first returns the full string with changes made; the second returns just the replacement strings.

1 Like

Here’s an example:

set someText to "This is an IP address: 192.168.5.128"
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1πŸ”ͺ$2πŸ”ͺ$3πŸ”ͺ$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"πŸ”ͺ"}

It looks like a long way around, but it may actually be a bit quicker than returning all capture groups in the first place. Searches in ASObjC are relatively slow because the methods return ranges, which then have to be used to create the strings.

Thanks for the example.
However, I am puzzled about one thing: use of Script Lib references.

Why doesn’t this work when I use the β€œStrLib” reference to the Library?

use StrLib : script "RegexAndStuffLib"

### THIS FAILS -- won't compile
set theResult to StrLib's regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1πŸ”ͺ$2πŸ”ͺ$3πŸ”ͺ$4"

But this does work, even though I have NOT issued a general β€œuse” command:

use StrLib : script "RegexAndStuffLib"

### THIS WORKS
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1πŸ”ͺ$2πŸ”ͺ$3πŸ”ͺ$4"

for Testing, here’s my complete script:

use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation" -- this may not be required
use framework "AppKit" -- this may not be required
use scripting additions

use StrLib : script "RegexAndStuffLib"


set someText to "This is an IP address: 192.168.5.128"

### THIS FAILS -- won't compile
###  set theResult to StrLib's regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1πŸ”ͺ$2πŸ”ͺ$3πŸ”ͺ$4"

### THIS COMPILES & Runs Fine
set theResult to regex search someText search pattern "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)" replace template "$1πŸ”ͺ$2πŸ”ͺ$3πŸ”ͺ$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"πŸ”ͺ"}

I have been using the form of β€œuse” ilke this with my Script Library for several years:
use RefName : script "LibraryName"
set x to RefName's SomeHandler(p1, p2)

What am I missing?

Shane, I have been using a RegEx handler that returns ALL Capture Groups, based on your scripts, for at least a year now, and have found it to be easy to use and very fast.

Here’s a comparison:

property ptyScriptName : "How to Process Multiple Capture Groups"
property ptyScriptVer : "1.0"
property ptyScriptDate : "2019-07-17"
property ptyScriptAuthor : "JMichaelTX"

(*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PURPOSE:
  β€’ How to Process Multiple Capture Groups
  
REF:  The following were used in some way in the writing of this script.

  1.  2019-07-17, ShaneStanley, Late Night Software Ltd.
      RegexAndStuffLib Script Library
      https://forum.latenightsw.com/t/regexandstufflib-script-library/2018/4?u=jmichaeltx

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*)
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation" -- this may not be required
use framework "AppKit" -- this may not be required
use scripting additions

use StrLib : script "RegexAndStuffLib"

set someText to "This is an IP address: 192.168.5.128"
set reFind to "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)"

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--Β»   Using Shane's new RegexAndStuffLib Library
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

set theResult to regex search someText search pattern reFind replace template "$1πŸ”ͺ$2πŸ”ͺ$3πŸ”ͺ$4"
set theNumbers to split string (item 1 of theResult) using delimiters {"πŸ”ͺ"}

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--Β»   Using My RegEx Handler to Return ALL Capture Groups
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


set reMatches to my regexFind(reFind, someText, true)
-->{{"192.168.5.128", "192", "168", "5", "128"}}
--- see comments in below handler for description of returned list

### I have found this to be very fast.

--~~~~~~~~~~~~~~~~~~~~~~~ END OF MAIN SCRIPT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
on regexFind(pFindRegEx, pSourceString, pGlobalBool) -- @RegEx @Find @Search @Strings @ASObjC @Shane
  --–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  (*  VER: 1.2    2019-05-18    -- Check for No Matches
    PURPOSE:  Find Match(s) & Return Match with All Capture Groups as List of Lists
    METHOD:    Uses ASObjC RegEx (which is based on ICU Regex)
    PARAMETERS:
      β€’ pFindRegEx    | text |  RegEx pattern to search for
      β€’ pSourceString | text |  Source String to be searched
      β€’ pGlobalBool   | bool |  Set true for Global search (all matches)

    RETURNS: IF pGlobalBool:  List of Lists, one list per match
                ELSE:  Single List of first match
                Each Match List is a List of Full Match + One Item per Capture Group
                  {<Full Match>, <CG1>, <CG2>, <CG3>, ...}
                  IF CG not found, Item is returned as empty string
                If NO matches, return empty list {}

    AUTHOR:  JMichaelTX
    ## REQUIRES:  use framework "Foundation"
    REF:  
      1. 2017-11-22, ShaneStanley, Does SD6 Find RegEx Support Case Change?
           β€’ Late Night Software Ltd., 
          β€’ http://forum.latenightsw.com//t/does-sd6-find-regex-support-case-change/816/8
      2.  ICU RegEx Users Guide
          http://userguide.icu-project.org/strings/regexp
    --–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  *)
  
  local theFinds, theFind, matchFound, theResult, subResult, groupCount
  set LF to linefeed
  
  try
    set matchFound to false
    
    set pSourceString to current application's NSString's stringWithString:pSourceString
    set {theRegex, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pFindRegEx options:0 |error|:(reference)
    if theRegex is missing value then error ("Invalid RegEx Pattern." & LF & theError's localizedDescription() as text)
    
    if (pGlobalBool) then ### FIND ALL MATCHES ###
      set theFinds to theRegex's matchesInString:pSourceString options:0 range:{0, pSourceString's |length|()}
      if (theFinds is not missing value) then set matchFound to true
      
    else ### FIND FRIST MATCH ###
      set theFind to theRegex's firstMatchInString:pSourceString options:0 range:{0, pSourceString's |length|()}
      if (theFind is not missing value) then
        set theFinds to current application's NSMutableArray's array()
        (theFinds's addObject:theFind)
        set matchFound to true
      else
        set theFinds to {}
        set theResult to {}
        set matchFound to false
      end if
    end if
    
    if matchFound then
      
      set theResult to current application's NSMutableArray's array()
      
      repeat with aFind in theFinds
        set subResult to current application's NSMutableArray's array()
        set groupCount to aFind's numberOfRanges()
        
        repeat with i from 0 to (groupCount - 1)
          
          set theRange to (aFind's rangeAtIndex:i)
          if |length| of theRange = 0 then
            --- Optional Capture Group was NOT Matched ---
            (subResult's addObject:"")
          else
            --- Capture Group was Matched ---
            (subResult's addObject:(pSourceString's substringWithRange:theRange))
          end if
        end repeat
        
        (theResult's addObject:subResult)
        
      end repeat -- theFinds
      
      set matchList to theResult as list
      
    else ### NO MATCH WAS FOUND ###
      
      set matchList to {}
      
    end if
    
  on error errMsg number errNum
    set errMsg to "ASObjC RegEx ERROR #" & errNum & LF & errMsg
    set the clipboard to errMsg
    display dialog errMsg & LF & Β¬
      "(error msg is on Clipboard)" with title (name of me) with icon stop
    error errMsg
    
  end try
  
  return matchList
  
end regexFind
--~~~~~~~~~~~~~~~~~~~~ END of Handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And that’s the way to work with a library that doesn’t have a terminology dictionary. But when one does, you no longer need the reference. AppleScript uses the terminology to work out the target itself.

Keep using it. As I said, I’m thinking about it. But as a data point, that takes more than twice as long as my (admittedly contrived) example.

It’s largely an issue of how to fit it in with the other parameters, and not ending up with a confusing dictionary or spaghetti code.

OK, version 1.0.1 is now available. It introduces a separate command, regex search once, for searching for the first or last match only. Both it and the regex search command also allow multiple capture groups, or all capture groups, to be specified.

2 Likes

Many thanks Shane. This is really appreciated. :+1:
I hope it was not too much trouble.

It’s in the Dict, but for easy ref, here’s the syntax:
(pass empty list as parameter to Capture Groups)

set reMatchList to regex search sourceStr search pattern reFind capture groups {}

Returns a nice list of lists, with first item as full match, and each remaining item as a Capture Group:

image

I use RegEx and string handlers all the time, so I think this is going to be my favorite of ALL script libraries, and I really like that it has terminology so we don’t have to use the reference name when using.

Version 1.0.2 1.0.3 1.0.4 is now available.

Version 1.0.2 introduces performance improvements. In particular, the regex search command is faster β€” significantly so if you specify a replace template, or capture groups is either unused or a single integer.
Version 1.0.3 fixes bugs with changing the case of lists of strings in 1.0.2.
Version 1.0.4 fixes a bug in the regex search once command, where it returns an array rather than a list when multiple capture groups are specified.

2 Likes