Here’s a snippet of code demonstrating how to use NSRegularExpression to extract phone numbers from a string.
--
-- Created by: Mark Alldritt
-- Created on: 2018-01-05
--
-- Copyright (c) 2018 Late Night Software Ltd.
-- All Rights Reserved
--
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
-- classes, constants, and enums used
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpression : a reference to current application's NSRegularExpression
property NSNotFound : a reference to 9.22337203685477E+18 + 5807 -- see http://latenightsw.com/high-sierra-applescriptobjc-bugs/
-- Lets look for US phone numbers of the form 000-0000, (000) 000-0000, 000-000-0000, (000)-000-0000
set usPhoneNumberPattern to "\\(?(\\d{3})?\\)?\\s*-?\\s*(\\d{3})\\s*-?\\s*(\\d{4})"
set theSample to "333-1234, 250-888-8888, (123) 350-1234, (456)-350-1234"
set theRegEx to NSRegularExpression's regularExpressionWithPattern:usPhoneNumberPattern options:NSRegularExpressionCaseInsensitive |error|:(missing value)
set theMatches to theRegEx's matchesInString:theSample options:0 range:[0, theSample's length]
set thePhoneNumbers to {}
repeat with aMatch in theMatches
-- Get the matched range of text
set wholeRange to (aMatch's rangeAtIndex:0) as record
set thePhoneNumber to text ((wholeRange's location) + 1) thru ((wholeRange's location) + (wholeRange's |length|)) of theSample
-- Get the groups of the regular expression match
set numRanges to aMatch's numberOfRanges as integer
set parts to {"000", "000", "0000"}
repeat with rangeIndex from 1 to numRanges - 1
set partRange to (aMatch's rangeAtIndex:rangeIndex) as record
if partRange's location is not NSNotFound then ¬
set item rangeIndex of parts to text ((partRange's location) + 1) thru ((partRange's location) + (partRange's |length|)) of theSample
end repeat
-- Collect the results
set end of thePhoneNumbers to {|phoneNumber|:thePhoneNumber, parts:parts}
end repeat
thePhoneNumbers
--> {
-- {phoneNumber:"333-1234", parts:{"000", "333", "1234"}},
-- {phoneNumber:"250-888-8888", parts:{"250", "888", "8888"}},
-- {phoneNumber:"(123) 350-1234", parts:{"123", "350", "1234"}},
-- {phoneNumber:"(456)-350-1234", parts:{"456", "350", "1234"}}
-- }
Your script’s assuming there’s a one-to-one equivalence between characters in an AS string and locations in an NSString. There probably is when the AS string only contains US phone numbers, but there may not be otherwise. Ideally, the text matching should all be done in ASObjC and the results coerced to AS text as obtained.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
-- classes, constants, and enums used
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpression : a reference to current application's NSRegularExpression
-- property NSNotFound : a reference to 9.22337203685477E+18 + 5807 -- Not needed if the |length|'s checked for 0 instead.
-- Lets look for US phone numbers of the form (000) 000-0000, 000-000-0000, (000)-000-0000
set usPhoneNumberPattern to "\\(?(\\d{3})?\\)?\\s*-?\\s*(\\d{3})\\s*-?\\s*(\\d{4})"
set theSample to "333-1234, 250-888-8888, (123) 350-1234, (456)-350-1234"
set theNSStringSample to current application's NSString's stringWithString:theSample
set theRegEx to NSRegularExpression's regularExpressionWithPattern:usPhoneNumberPattern options:NSRegularExpressionCaseInsensitive |error|:(missing value)
set theMatches to theRegEx's matchesInString:theNSStringSample options:0 range:{0, theNSStringSample's |length|()}
set thePhoneNumbers to {}
repeat with aMatch in theMatches
-- Get the matched range of text
set wholeRange to (aMatch's rangeAtIndex:0) as record
set thePhoneNumber to (theNSStringSample's substringWithRange:wholeRange) as text
-- Get the groups of the regular expression match
set numRanges to aMatch's numberOfRanges as integer
set parts to {"000", "000", "0000"}
repeat with rangeIndex from 1 to numRanges - 1
set partRange to (aMatch's rangeAtIndex:rangeIndex) as record
if partRange's |length| > 0 then ¬
set item rangeIndex of parts to (theNSStringSample's substringWithRange:partRange) as text
end repeat
-- Collect the results
set end of thePhoneNumbers to {|phoneNumber|:thePhoneNumber, parts:parts}
end repeat
thePhoneNumbers
Mark, thanks for sharing. It’s always great to see another example of using RegEx with ASObjC.
As @NigelGarvey pointed out, this can be a complicated issue. Over the years it has been much discussed by the RegEx community. Here is one example from StackOverflow.com:
If you just want the phone numbers without the individual parts, it’s also possible to use NSDataDetector. It even finds my non-US ones! But I don’t know how clever it is universally.
set theSample to "333-1234, 250-888-8888, (123) 350-1234, (456)-350-1234"
set theNSStringSample to current application's NSString's stringWithString:theSample
set theDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypePhoneNumber) |error|:(missing value)
set theMatches to theDetector's matchesInString:theNSStringSample options:0 range:{0, theNSStringSample's |length|()}
Thanks Nigel. That seems to return a NS object. How do we get a std AS list?
Also, I don’t think these are valid phone numbers: 333-1234 – missing area code (456)-350-1234 – a dash should not follow a closing parenthesis. It should be nothing or a space.
NSDataDetector is a relative of NSRegularExpression. Their matchesInString:options:range: methods both return an array of NSTextCheckingResult (or an empty array when there are no matches). So the two lines containing ‘theDetector’ in post #4 could simply replace the two containing ‘theRexEx’ in post #2. In practice, of course, the line defining the regex pattern then becomes redundant — as does the code for extracting the ‘parts’ of the numbers, since no ranges are returned for them.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
-- classes, constants, and enums used
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpression : a reference to current application's NSRegularExpression
set theSample to "333-1234, 250-888-8888, (123) 350-1234, (456)-350-1234"
set theNSStringSample to current application's NSString's stringWithString:theSample
set theDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypePhoneNumber) |error|:(missing value)
set theMatches to theDetector's matchesInString:theNSStringSample options:0 range:{0, theNSStringSample's |length|()}
set thePhoneNumbers to {}
repeat with aMatch in theMatches
-- Get the matched range of text
set wholeRange to (aMatch's rangeAtIndex:0) as record -- or aMatch's range() as record
set thePhoneNumber to (theNSStringSample's substringWithRange:wholeRange) as text
-- Collect the results
set end of thePhoneNumbers to {|phoneNumber|:thePhoneNumber}
end repeat
thePhoneNumbers
I’m afraid I can accept no responsibility for that.
Nor here. An area code isn’t required between land-line numbers on the same area exchange, but can be used without confusing the system.
My understanding of the purpose of Mark’s script is that it’s a “how to” demonstrating the use of NSRegularExpression — say, to extract US/Canadian-format phone numbers from a text and return them along with breakdowns of their parts. It’s not intended to do anything else or to be used directly for anything other than educational purposes.
Mark, thanks again for sharing an example of how to use RegEx with ASObjC.
May I suggest in the future that you make it clear in the opening description what the purpose of the script is. From what you posted, it looked like to me to be a tool to get phone numbers via RegEx. So that colored all of my thinking/responses subsequently.
In fact, I’d suggest that you even edit your OP here to make it clear, so that others, or even me months later, who come across this script will understand its purpose.