@ShaneStanley, I have a very nice handler for ASObjC RegEx Find, but I’m having trouble with the RegEx Change. Can you help, please?
TIA.
Here’s what I have:
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation"
use scripting additions
set sourceStr to "<table><tr><td>Cell text</td></tr><tr>"
set findStr to "<t(d|r)>"
set replaceStr to "\\n<t$1>"
set newStr to my regexChange(findStr, replaceStr, sourceStr)
### Incorrect Results
-->"<table>n<tr>n<td>Cell text</td></tr>n<tr>"
--- Using Satimage.osax ---
set replaceSIStr to "\\n<t\\1>"
set newSIStr to change findStr into replaceSIStr in sourceStr syntax "PERL" with regexp
### Correct Results
(*
<table>
<tr>
<td>Cell text</td></tr>
<tr>
*)
on regexChange(pFindStr, pReplaceStr, pSourceStr)
set nsCurApp to current application
set nsSourceStr to nsCurApp's NSString's stringWithString:pSourceStr
set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:pFindStr withString:pReplaceStr options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
### Also Tried This, but got Error ###
--set {nsFindRE, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pFindStr options:0 |error|:(reference)
--
--set {nsReplaceRE, theError} to current application's NSRegularExpression's regularExpressionWithPattern:pReplaceStr options:0 |error|:(reference)
--
--
--set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:nsFindRE withString:nsReplaceRE options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
return nsSourceStr as text
end regexChange
I did find one workaround, and it is not very desirable:
Change: set replaceStr to "\\n<t$1>"
To: set replaceStr to linefeed & "<t$1>"
Every other tool I use understands \n properly. I do a lot of RegEx development and testing at RegEx101.com, and some with BBEdit. Almost always I can copy the expression I develop there, and just paste where I need it.
BTW, SD7 is very helpful. When you paste a string using ⌘⇧V, it adds the necessary escape characters.
FWIW, I suspect that’s a slight simplification on Peter’s part. For example, according to the KM docs you can use \n for capture groups, which is something not supported in ICU regex. KM may well be pre-processing the input for closer compatibility with other flavors. (Script Debugger does something like this in its UI regex.)
Use of \n or \r or \t is NOT escaping.
There is also use of \ for unicode: \p{UNICODE PROPERTY NAME}
Regardless, are you saying there is no way to use \n to represent linefeed in a replace string for ASObjC RegEx Replace?
We have to use \\n to pass \n to the handler replace string.
Is there a way to replace the \n in the handler with the linefeed character, then use it in the RegEx Replace?
But use of \\n or \\r or \\t is, because your passing a literal \, which means treat the following character as a literal.
The real issue here is that you can’t use \n to represent a linefeed in an AppleScript string in compiled code. You can enter \n, which is what should be required to avoid the string containing a literal \, but as soon as you compile, AppleScript will replace \n with a literal linefeed character.
The setting in Script Editor changes the rules. If you’d like to see that supported in Script Debugger, please enter a feature request. (I’m not entirely sure it’s compatible with raw syntax — it actually relies on an undocumented private API.)
Objective-C lets you use \r and so on in code, and there’s no pass-to-compiler-and-get-back-compiled-code stage to deal with. You can use \n in AppleScript exactly the same — it’s just going to turn into a literal linefeed when you compile.
Yes. (Sorry. I don’t know how to “quote” images.) Those are all “matching” metacharacters for use in regex search strings. They “match” existing characters in the source text. The only metacharacters UCI regex has for substitution text are those shown in Shane’s post (#4).
What’s your objection to a literal line feed? If it’s because you particularly need to have “\\n” in the replacement string passed to the handler, you could have the handler itself doctor it:
on regexChange(pFindStr, pReplaceStr, pSourceStr)
set nsCurApp to current application
-- Replace any backslash-n sequences in the replace string with actual linefeeds.
set nsReplaceStr to (nsCurApp's NSString's stringWithString:pReplaceStr)'s stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
set nsSourceStr to nsCurApp's NSString's stringWithString:pSourceStr
set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:pFindStr withString:nsReplaceStr options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
return nsSourceStr as text
end regexChange
If I use a \n or \r I can’t tell which was actually used after the code is compiled. Likewise with \t – can’t tell whether it’s a tab or space.
Ease of Use of RegEx from other Sources
I can directly use a RegEx like \n"Some Text in Quotes"\tMore Text\r
by pasting with ⌘⇧V, and SD7 will do the escaping for me: "\\n\"Some Text in Quotes\"\\tMore Text\\r"
Thanks, this works great.
I need to apply this to \\r and \\t as well. Is there a better way than just duplicating this same statement:
set nsReplaceStr to (nsCurApp's NSString's stringWithString:pReplaceStr)'s stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
set nsReplaceStr to (nsCurApp's NSString's stringWithString:nsReplaceStr)'s stringByReplacingOccurrencesOfString:"\\r" withString:(return)
set nsReplaceStr to (nsCurApp's NSString's stringWithString:nsReplaceStr)'s stringByReplacingOccurrencesOfString:"\\t" withString:(tab)
I can’t think of a way to combine them, offhand. But I’d only use the stringWithString: method once:
set nsReplaceStr to nscurapp's NSString's stringWithString:pReplaceStr
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\r" withString:(return)
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\t" withString:(tab)
Nigel has posted what is probably a better solution for you, but I should point out that View --> Show Invisibles solves this problem for linefeed, return and tab.
So, after a great discussion, Nigel provided the actual solution in his above post.
Then, to handle the other metacharacters, he improved on my “dup” approach with this:
My thanks to Shane for his many excuses explanations of why the combo of AppleScript and Obj-C RegEx works like it does. Sorry, Shane – just a bit of ribbing. It still seems like to me that if I pass the ASObjC RegEx statement with the string \n (derived from \\n) it should work just like the actual Objective-C statement that supports \n. I guess it is the AppleScript compiler that is getting in the way. But in the end, it was really a simple fix in our handler.
So, finally I’m combining all and cleaning up into a handler worthy of a script library (at least for me), and posting my final script here:
ASObjC RegEx Change Handler
property ptyScriptName : "ASObjC RegEx Change Handler"
property ptyScriptVer : "2.0"
property ptyScriptDate : "2018-07-06"
property ptyScriptAuthor : "JMichaelTX" -- with help from NigelGarvey
(*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PURPOSE:
• Test ASObjC RegEx Change Handler
RETURNS: Source String with All Found Changes
REQUIRED:
1. macOS 10.11.6+
2. EXTERNAL OSAX Additions/LIBRARIES/FUNCTIONS
• Satimage.osax (for comparison only)
TAGS: @CAT.RegEx @CAT.Change @Lang.AS @Lang.ASObjC @type.Handler @type.KB @Auth.JMichaelTX
REF: The following were used in some way in the writing of this script.
1. 2018-07-06, NigelGarvey, Late Night Software Ltd.
ASObjC RegEx Change Handler
http://forum.latenightsw.com/t/asobjc-regex-change-handler/1395/10
2. ICU RegEx Users Guide
http://userguide.icu-project.org/strings/regexp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*)
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation"
use scripting additions
set sourceStr to "<table><tr><td>Cell text</td></tr><tr>"
set findStr to "<t(d|r)>"
(* --- Replace String ---
• May contain these RegEx MetaChars:
\\n, \\r, \\t for linefeed, return, and tab
$<CG>, where <CG> is Capture Group number
*)
set replaceStr to "\\n<t$1>"
--- Using Below ASObjC Handler ---
set newStr to my regexChange(findStr, replaceStr, sourceStr)
--- Using Satimage.osax ---
-- Same as above, except Capture Group metachar uses a "\\" instead of a "$"
set replaceSIStr to "\\n<t\\1>"
set newSIStr to change findStr into replaceSIStr in sourceStr syntax "PERL" with regexp
return newStr
### Correct Results Returned by Both ###
(*
<table>
<tr>
<td>Cell text</td></tr>
<tr>
*)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
on regexChange(pFindStr, pReplaceStr, pSourceStr) -- @RegEX @Change @Replace @Strings @ASObjC
--–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
(* VER: 2.0 2018-07-06
PURPOSE: Change ALL Occurances of pFindStr into pReplaceStr in pSourceStr
METHOD: Uses ASObjC RegEx (which is based on ICU Regex)
PARAMETERS:
• pFindStr | text | RegEx Pattern to Find
• pReplaceStr | text | Replace string which
• May contain these RegEx MetaChars:
\\n, \\r, \\t for linefeed, return, and tab
$<CG>, where <CG> is Capture Group number
• pSourceStr | text | Source String to be searched
RETURNS: Revised String with all changes that were found.
AUTHOR: JMichaelTX -- with help from NigelGarvey
## REQUIRES: use framework "Foundation"
REF:
1. 2018-07-06, NigelGarvey, Late Night Software Ltd.
ASObjC RegEx Change Handler
http://forum.latenightsw.com/t/asobjc-regex-change-handler/1395/10
2. ICU RegEx Users Guide
http://userguide.icu-project.org/strings/regexp
--–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
*)
set nsCurApp to current application
-- Replace any backslash-n sequences in the replace string with actual character --
set nsReplaceStr to nsCurApp's NSString's stringWithString:pReplaceStr
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\n" withString:(linefeed)
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\r" withString:(return)
set nsReplaceStr to nsReplaceStr's stringByReplacingOccurrencesOfString:"\\t" withString:(tab)
set nsSourceStr to nsCurApp's NSString's stringWithString:pSourceStr
set nsSourceStr to (nsSourceStr's stringByReplacingOccurrencesOfString:pFindStr withString:nsReplaceStr options:(nsCurApp's NSRegularExpressionSearch) range:{0, nsSourceStr's |length|()})
return nsSourceStr as text
end regexChange
--~~~~~~~~~~~~~~~~~~~~ END of Handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let me try to at least explain it one more time. You’re passing \\n in AppleScript, not \n. The equivalent in Objective-C would also be to pass \\n — not\n. The rules for escaping linefeeds are identical in both languages. Your Objective-C example shows \n, and the AppleScript equivalent to that would also use \n — until the compiler grabs hold of it.
Well, as long as you’re beating a dead horse, I’ll join in with you.
Most likely my logic is flawed, but when I pass \\n to the Obj-C engine, I’m telling AppleScript to pass the actual characters “\n”, just like when we pass a quote mark to a shell script using \". When Obj-C get the string, I expect it to act on the characters \n just like it does in the Obj-C statement:
Obj-C then recognizes that \n is a shortcut for linefeed, and makes the substitution internally.
I don’t know anything about Obj-C, so I’m probably all wet here. But that’s my logic.
I think the whole issue here stems from the way that AppleScript compiler handles \n. Why should it handle it any different from the way it handles linefeed ???
If I want to see a literal LF displayed, then I’ll press RETURN (with line endings set to LF in preferences).
IAC, I’m happy to quit beating this dead horse, since @NigelGarvey has provided a great solution, which, BTW, is totally consistent with how Satimage.osax RegEx works (I pass it a \\n which it receives as a linefeed).
set newString to (regex's stringByReplacingMatchesInString:aString options: range:{0, aString's |length|()}) withTemplate:"\n"
set newString2 to (regex's stringByReplacingMatchesInString:aString options: range:{0, aString's |length|()}) withTemplate:"\\n"
Whichever language you use, the strings need to be exactly the same. (Objective-C supports other escaping that AS doesn’t, but let’s not complicate the issue.)
The issues you’re facing arise from two things:
ICU regex has a different rule on the use of \ in replacement templates than some other flavors of regex. It wants just \n, not \\n, for a linefeed. In ICU regex syntax, \\n in a replacement template means a literal \ followed by the letter n, which is not what you want.
AppleScript insists on expanding \n to its literal equivalent when compiling.
Pre-processing your template as @NigelGarvey suggested neatly deals with both. Doing so effectively changes the rules for template strings — which from your point of view is probably ideal — but you have just created yet another flavor of regex.
Shane, I think you nailed it.
I have accepted the way ASObjC RegEx works, but it is not what I expected nor what I want.
But, as you say:
Which is fine with me. My RegEx Change handler now works as I expect, and as I want. It is consistent with all of the other RegEx tools that I use, particularly Satimage.osax.