ASObjC RegEx Change Handler

asobjc

(Jim Underwood) #1

Continuing the discussion from Unable to "open text file" or "save workbook as" in Excel 2016:

@ShaneStanley, I have a very nice handler for ASObjC RegEx Find, but I’m having trouble with the RegEx Change. Can you help, please?

TIA.

Here’s what I have:

use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation"
use scripting additions

set sourceStr to "<table><tr><td>Cell text</td></tr><tr>"
set findStr to "<t(d|r)>"
set replaceStr to "\\n<t$1>"

set newStr to my regexChange(findStr, replaceStr, sourceStr)
### Incorrect Results
-->"<table>n<tr>n<td>Cell text</td></tr>n<tr>"

--- Using Satimage.osax ---
set replaceSIStr to "\\n<t\\1>"
set newSIStr to change findStr into replaceSIStr in sourceStr syntax "PERL" with regexp
### Correct Results
(*
<table>
<tr>
<td>Cell text</td></tr>
<tr>
*)

on regexChange(pFindStr, pReplaceStr, pSourceStr)
  
  set nsCurApp to current application
  set nsSourceStr to nsCurApp's NSString's stringWithString:pSourceStr
  
  set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:pFindStr withString:pReplaceStr options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
  
  ### Also Tried This, but got Error ###
  
  --set {nsFindRE, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pFindStr options:0 |error|:(reference)
  --  
  --set {nsReplaceRE, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pReplaceStr options:0 |error|:(reference)
  --  
  --  
  --set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:nsFindRE withString:nsReplaceRE options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
  
  return nsSourceStr as text
  
end regexChange


As noted below, @NigelGarvey provided the solution.
Please see my last post for the Final Script Handler



(Jim Underwood) #2

I did find one workaround, and it is not very desirable:

Change:
set replaceStr to "\\n<t$1>"

To:
set replaceStr to linefeed & "<t$1>"

Every other tool I use understands \n properly. I do a lot of RegEx development and testing at RegEx101.com, and some with BBEdit. Almost always I can copy the expression I develop there, and just paste where I need it.

BTW, SD7 is very helpful. When you paste a string using ⌘⇧V, it adds the necessary escape characters.

@ShaneStanley, I found your example, Script 19-1 in Everyday AppleScriptObjC, Third Edition. It has the same problem: “\\n” results in just “n”, not a real linefeed.

I hope there is another way.


(Nigel Garvey) #3

Hi Jim.

A literal linefeed is the best way when you’re using ICU regex. Backslashing a character in a replacement string actually means treat it as a literal.


(Jim Underwood) #4

Thanks for the reply, Nigel.

On that same page of ICU Regex, it shows:

As you can clearly see, \n is clearly supported.

I just ran a test in Keyboard Maestro, which also uses ICU RegEx (KM), and it worked fine using \n. In fact, I didn’t even have to escape it.

Keyboard Maestro

BBEdit and RegEx101.com also supports use of \n in the Replace string.

As I said, every other RegEx tool I have use supports \n just fine.

I thought I had a better workaround – just turn ON the escape of tabs and line breaks in SD7. Unfortunately, SD7 does not seem to support that:

Script Editor Preferences

image

@alldritt: Is this a bug, or design choice?


So, I hope there is some way to use \n (and the other whitespace metacharacters) in ASObjC RegEx Replace.


(Shane Stanley) #5

Under Find and Replace it also shows:


(Shane Stanley) #6

FWIW, I suspect that’s a slight simplification on Peter’s part. For example, according to the KM docs you can use \n for capture groups, which is something not supported in ICU regex. KM may well be pre-processing the input for closer compatibility with other flavors. (Script Debugger does something like this in its UI regex.)


(Jim Underwood) #7

I don’t see how that changes anything.

Use of \n or \r or \t is NOT escaping.
There is also use of \ for unicode:
\p{UNICODE PROPERTY NAME}

Regardless, are you saying there is no way to use \n to represent linefeed in a replace string for ASObjC RegEx Replace?

We have to use \\n to pass \n to the handler replace string.
Is there a way to replace the \n in the handler with the linefeed character, then use it in the RegEx Replace?


(Jim Underwood) #8

@ShaneStanley and @NigelGarvey:

What about this Objective-C Statement, which uses a
withTemplate:@"\n"

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\n+" options:0 error:NULL];
NSString *newString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@"\n"];

Posted here:
2013-04-08, Attila H, Stack Overflow
NSString replace repeated newlines with single newline
https://stackoverflow.com/a/15876198/915019


(Shane Stanley) #9

But use of \\n or \\r or \\t is, because your passing a literal \, which means treat the following character as a literal.

The real issue here is that you can’t use \n to represent a linefeed in an AppleScript string in compiled code. You can enter \n, which is what should be required to avoid the string containing a literal \, but as soon as you compile, AppleScript will replace \n with a literal linefeed character.

The setting in Script Editor changes the rules. If you’d like to see that supported in Script Debugger, please enter a feature request. (I’m not entirely sure it’s compatible with raw syntax — it actually relies on an undocumented private API.)

Objective-C lets you use \r and so on in code, and there’s no pass-to-compiler-and-get-back-compiled-code stage to deal with. You can use \n in AppleScript exactly the same — it’s just going to turn into a literal linefeed when you compile.


(Nigel Garvey) #10

Yes. (Sorry. I don’t know how to “quote” images.) Those are all “matching” metacharacters for use in regex search strings. They “match” existing characters in the source text. The only metacharacters UCI regex has for substitution text are those shown in Shane’s post (#4).

What’s your objection to a literal line feed? If it’s because you particularly need to have “\\n” in the replacement string passed to the handler, you could have the handler itself doctor it:

on regexChange(pFindStr, pReplaceStr, pSourceStr)
	
	set nsCurApp to current application
	
	-- Replace any backslash-n sequences in the replace string with actual linefeeds.
	set nsReplaceStr to (nsCurApp's NSString's stringWithString:pReplaceStr)'s stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
	
	set nsSourceStr to nsCurApp's NSString's stringWithString:pSourceStr
	
	set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:pFindStr withString:nsReplaceStr options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
	
	return nsSourceStr as text
	
end regexChange

(Jim Underwood) #11

Yes it is, and here’s why:

  1. Code clarity.
    • If I use a \n or \r I can’t tell which was actually used after the code is compiled. Likewise with \t – can’t tell whether it’s a tab or space.
  2. Ease of Use of RegEx from other Sources
    • I can directly use a RegEx like
      \n"Some Text in Quotes"\tMore Text\r
      by pasting with ⌘⇧V, and SD7 will do the escaping for me:
      "\\n\"Some Text in Quotes\"\\tMore Text\\r"

Thanks, this works great.

I need to apply this to \\r and \\t as well. Is there a better way than just duplicating this same statement:

  set nsReplaceStr to (nsCurApp's NSString's stringWithString:pReplaceStr)'s stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
  set nsReplaceStr to (nsCurApp's NSString's stringWithString:nsReplaceStr)'s stringByReplacingOccurrencesOfString:"\\r" withString:(return)
  set nsReplaceStr to (nsCurApp's NSString's stringWithString:nsReplaceStr)'s stringByReplacingOccurrencesOfString:"\\t" withString:(tab)


(Nigel Garvey) #12

I can’t think of a way to combine them, offhand. But I’d only use the stringWithString: method once:

set nsReplaceStr to nscurapp's NSString's stringWithString:pReplaceStr
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\r" withString:(return)
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\t" withString:(tab)

Or:

set nsReplaceStr to nscurapp's NSMutableString's stringWithString:pReplaceStr
nsReplaceStr's replaceOccurrencesOfString:"\\n" withString:(linefeed) options:(0) range:{0, nsReplaceStr's |length|()}
nsReplaceStr's replaceOccurrencesOfString:"\\r" withString:(return) options:(0) range:{0, nsReplaceStr's |length|()}
nsReplaceStr's replaceOccurrencesOfString:"\\t" withString:(tab) options:(0) range:{0, nsReplaceStr's |length|()}

(Shane Stanley) #13

Nigel has posted what is probably a better solution for you, but I should point out that View --> Show Invisibles solves this problem for linefeed, return and tab.


(Jim Underwood) #14

Shane, I appreciate the suggestion, and maybe it solves the problem for you, but it doesn’t even come close for me:

image

I can’t begin to read the “invisible” markers, and that does not support copy/paste of a RegEx pattern between tools.

Nice try, but no cigar. :wink:

No doubt about it. :smile:


(Jim Underwood) #15

So, after a great discussion, Nigel provided the actual solution in his above post.
Then, to handle the other metacharacters, he improved on my “dup” approach with this:

My thanks to Shane for his many excuses explanations of why the combo of AppleScript and Obj-C RegEx works like it does. :smile: Sorry, Shane – just a bit of ribbing. It still seems like to me that if I pass the ASObjC RegEx statement with the string \n (derived from \\n) it should work just like the actual Objective-C statement that supports \n. I guess it is the AppleScript compiler that is getting in the way. But in the end, it was really a simple fix in our handler.

So, finally I’m combining all and cleaning up into a handler worthy of a script library (at least for me), and posting my final script here:

ASObjC RegEx Change Handler

property ptyScriptName : "ASObjC RegEx Change Handler"
property ptyScriptVer : "2.0"
property ptyScriptDate : "2018-07-06"
property ptyScriptAuthor : "JMichaelTX" -- with help from NigelGarvey

(*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PURPOSE:
  • Test ASObjC RegEx Change Handler
  
RETURNS:  Source String with All Found Changes

REQUIRED:
  1.  macOS 10.11.6+
  2.  EXTERNAL OSAX Additions/LIBRARIES/FUNCTIONS
        • Satimage.osax (for comparison only)
          
TAGS:  @CAT.RegEx @CAT.Change @Lang.AS @Lang.ASObjC @type.Handler @type.KB @Auth.JMichaelTX

REF:  The following were used in some way in the writing of this script.

  1.  2018-07-06, NigelGarvey, Late Night Software Ltd.
      ASObjC RegEx Change Handler
      http://forum.latenightsw.com/t/asobjc-regex-change-handler/1395/10
      
  2.  ICU RegEx Users Guide
      http://userguide.icu-project.org/strings/regexp

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*)
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation"
use scripting additions

set sourceStr to "<table><tr><td>Cell text</td></tr><tr>"
set findStr to "<t(d|r)>"

(*  --- Replace String ---
• May contain these RegEx MetaChars:  
      \\n, \\r, \\t for linefeed, return, and tab
      $<CG>, where <CG> is Capture Group number
*)
set replaceStr to "\\n<t$1>"

--- Using Below ASObjC Handler ---
set newStr to my regexChange(findStr, replaceStr, sourceStr)

--- Using Satimage.osax ---
-- Same as above, except Capture Group metachar uses a "\\" instead of a "$"
set replaceSIStr to "\\n<t\\1>"
set newSIStr to change findStr into replaceSIStr in sourceStr syntax "PERL" with regexp

return newStr

### Correct Results Returned by Both ###
(*
<table>
<tr>
<td>Cell text</td></tr>
<tr>
*)

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
on regexChange(pFindStr, pReplaceStr, pSourceStr) -- @RegEX @Change @Replace @Strings @ASObjC
  --–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  (*  VER: 2.0    2018-07-06
    PURPOSE:  Change ALL Occurances of pFindStr into pReplaceStr in pSourceStr
    METHOD:    Uses ASObjC RegEx (which is based on ICU Regex)
    PARAMETERS:
      • pFindStr    | text |  RegEx Pattern to Find
      • pReplaceStr | text |  Replace string which
          • May contain these RegEx MetaChars:  
                \\n, \\r, \\t for linefeed, return, and tab
                $<CG>, where <CG> is Capture Group number
      • pSourceStr   | text |  Source String to be searched

    RETURNS:  Revised String with all changes that were found.

    AUTHOR:  JMichaelTX -- with help from NigelGarvey
    ## REQUIRES:  use framework "Foundation"
    REF:  
      1.  2018-07-06, NigelGarvey, Late Night Software Ltd.
          ASObjC RegEx Change Handler
          http://forum.latenightsw.com/t/asobjc-regex-change-handler/1395/10
      2.  ICU RegEx Users Guide
          http://userguide.icu-project.org/strings/regexp
  --–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  *)
  set nsCurApp to current application
  
  -- Replace any backslash-n sequences in the replace string with actual character --
  
  set nsReplaceStr to nsCurApp's NSString's stringWithString:pReplaceStr
  set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
  set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\r" withString:(return)
  set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\t" withString:(tab)
  
  set nsSourceStr to nsCurApp's NSString's stringWithString:pSourceStr
  
  set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:pFindStr withString:nsReplaceStr options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
  
  return nsSourceStr as text
  
end regexChange
--~~~~~~~~~~~~~~~~~~~~ END of Handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~


(Shane Stanley) #16

Quite. It was merely addressing the point I quoted, which also crops up in other contexts.


(Shane Stanley) #17

Let me try to at least explain it one more time. You’re passing \\n in AppleScript, not \n. The equivalent in Objective-C would also be to pass \\nnot \n. The rules for escaping linefeeds are identical in both languages. Your Objective-C example shows \n, and the AppleScript equivalent to that would also use \n — until the compiler grabs hold of it.


(Jim Underwood) #18

Well, as long as you’re beating a dead horse, I’ll join in with you.

Most likely my logic is flawed, but when I pass \\n to the Obj-C engine, I’m telling AppleScript to pass the actual characters “\n”, just like when we pass a quote mark to a shell script using \". When Obj-C get the string, I expect it to act on the characters \n just like it does in the Obj-C statement:

NSString *newString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@"\n"];

Obj-C then recognizes that \n is a shortcut for linefeed, and makes the substitution internally.

I don’t know anything about Obj-C, so I’m probably all wet here. But that’s my logic.

I think the whole issue here stems from the way that AppleScript compiler handles \n. Why should it handle it any different from the way it handles linefeed ???

If I want to see a literal LF displayed, then I’ll press RETURN (with line endings set to LF in preferences).

IAC, I’m happy to quit beating this dead horse, since @NigelGarvey has provided a great solution, which, BTW, is totally consistent with how Satimage.osax RegEx works (I pass it a \\n which it receives as a linefeed).


(Shane Stanley) #19

There is no separate Obj-C engine doing any converting of strings — it’s not like the shell. Here are two lines of Objective-C code:

NSString *newString = [regex stringByReplacingMatchesInString:aString options:0 range:NSMakeRange(0, [aString length]) withTemplate:@"\n"];
NSString *newString2 = [regex stringByReplacingMatchesInString: aString options:0 range:NSMakeRange(0, [aString length]) withTemplate:@"\\n"];

The same thing in ASObjC is this (uncompiled):

set newString to (regex's stringByReplacingMatchesInString:aString options: range:{0, aString's |length|()}) withTemplate:"\n"
set newString2 to (regex's stringByReplacingMatchesInString:aString options: range:{0, aString's |length|()}) withTemplate:"\\n"

Whichever language you use, the strings need to be exactly the same. (Objective-C supports other escaping that AS doesn’t, but let’s not complicate the issue.)

The issues you’re facing arise from two things:

  • ICU regex has a different rule on the use of \ in replacement templates than some other flavors of regex. It wants just \n, not \\n, for a linefeed. In ICU regex syntax, \\n in a replacement template means a literal \ followed by the letter n, which is not what you want.

  • AppleScript insists on expanding \n to its literal equivalent when compiling.

Pre-processing your template as @NigelGarvey suggested neatly deals with both. Doing so effectively changes the rules for template strings — which from your point of view is probably ideal — but you have just created yet another flavor of regex. :smile:


(Jim Underwood) #20

Shane, I think you nailed it.
I have accepted the way ASObjC RegEx works, but it is not what I expected nor what I want.
But, as you say:

Which is fine with me. My RegEx Change handler now works as I expect, and as I want. It is consistent with all of the other RegEx tools that I use, particularly Satimage.osax.

WRT flavors, I’ve always liked home-made ice cream. :wink:

Thank you for your exhaustive explanation of how ASObjC works, and for your patience.