Data detectors and AppleScript

Back in the bad-old-days we had a scripting addition that gave us access to apple’s Data Detectors.

Is there anything like that now?

Basically I need to extract emails, in order from blocks of text, but not get uses of @ that are not in email addresses.

Any suggestions?

I’m not sure of what you need because I don’t understand this:

This is based on Shane’s book:

use framework "Foundation"
use framework "AppKit"
use scripting additions

set theString to "This is a string with aa@bb.com some garbage text apple.com"

set anNSString to current application's NSString's stringWithString:theString
set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:anNSString's |length|()}
set theLinks to (theURLsNSArray's valueForKeyPath:"URL.absoluteString")
set thePred to (current application's NSPredicate's predicateWithFormat:("self BEGINSWITH 'mailto:'"))
return (theLinks's filteredArrayUsingPredicate:thePred) as list

Note that that leaves in the “mailto:”. If I were writing it today, I’d probably use something like this:

use framework "Foundation"
use framework "AppKit"
use scripting additions

set theString to "This is a string with aa@bb.com some mailto:bb@cc.dd apple.com"

set anNSString to current application's NSString's stringWithString:theString
set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{0, anNSString's |length|()}
set thePred to (current application's NSPredicate's predicateWithFormat:("URL.scheme == 'mailto'"))
return ((theURLsNSArray's filteredArrayUsingPredicate:thePred)'s valueForKeyPath:"URL.resourceSpecifier") as list

But I’m probably picking nits.

1 Like

Thanks guys, these are exactly what I needed. (In some cases I’d want the “mailto:” and others cases no).

Hey Folks,

I’m looking for a list of ALL data-detector types.

I’ve found these so far:

NSTextCheckingTypeAddress
NSTextCheckingTypeCorrection
NSTextCheckingTypeDash
NSTextCheckingTypeDate
NSTextCheckingTypeGrammar
NSTextCheckingTypeLink
NSTextCheckingTypeOrthography
NSTextCheckingTypePhoneNumber
NSTextCheckingTypeQuote
NSTextCheckingTypeRegularExpression
NSTextCheckingTypeReplacement
NSTextCheckingTypeSpelling
NSTextCheckingTypeTransitInformation

Have I missed any?

Is there one for currency?

TIA.

-Chris

Data detectors only support a subset of text checking types. From the docs:

Currently, the supported data detectors checkingTypes are: NSTextCheckingTypeDate, NSTextCheckingTypeAddress, NSTextCheckingTypeLink, NSTextCheckingTypePhoneNumber, and NSTextCheckingTypeTransitInformation.

The NSTextCheckingResult class is a sort of generic result holder used by a variety of classes, including NSDataDetector, NSRegularExpression, and NSSpellChecker.

1 Like

Hey Shane,

Would you please demonstrate how to rewrite your script in Post #3 to extract addresses? And perhaps components of addresses.

I’ve tried to figure it out, until I’m crosseyed…

-Chris

use framework "Foundation"
use scripting additions

set theString to "he lives at 54 Rex Street, New York, 11111 during summer"

set anNSString to current application's NSString's stringWithString:theString
set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeAddress) |error|:(missing value)
set resultsArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:anNSString's |length|()}
set theAddresses to (resultsArray's valueForKey:"addressComponents") as list
1 Like

works with this string too:

set theString to "he lives at 54 Rex Street, New York, NY, 11111 during summer"

Hey Shane,

Many thanks!

I knew it had to be easier than I was trying to make it…

-Chris

Hey Shane,

I notice this doesn’t get detected:

set theString to "Elkhart, IN 46516"

Data-detectors won’t pick up US State abbreviations?

Is the only workaround to find/replace State abbreviations with their complete names?

Do we have an ASObjC replace construct for replacing an ordered list of items with an ordered list of items?

I could do this with the Satimage.osax for instance:

set dataStr to "OK TX IL"
set newStr to change {"\\bIL\\b", "\\bOK\\b", "\\bTX\\b"} into {"Illinois", "Oklahoma", "Texas"} in dataStr with regexp

If not I have a good replace-string handler and can build something.

TIA.

-Chris

I don’t think DD work consistently…

use framework "Foundation"
use scripting additions
set addressStrings to {}
set the end of addressStrings to "he lives at 54 Rex Street, New York, NY, 11111 during summer"
set the end of addressStrings to "he lives at 54 Rex Street, Elkhart, IN 46516 during summer"
set the end of addressStrings to "he lives at 54 Rex Street, Elkhart, IN. 46516 during summer"
set the end of addressStrings to "he lives at 54 Rex Street, Elkhart, IND 46516 during summer"
set the end of addressStrings to "he lives at 54 Rex Street, Elkhart, Indiana 46516 during summer"
set the end of addressStrings to "he lives Elkhart, Indiana 46516 during summer"
set the end of addressStrings to "he lives Elkhart, IN. 46516 during summer"
set the end of addressStrings to "New York, N.Y., 11111 during summer"

set foundAdresses to {}
repeat with theString in addressStrings
   set anNSString to (current application's NSString's stringWithString:theString)
   set theNSDataDetector to (current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeAddress) |error|:(missing value))
   set resultsArray to (theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:anNSString's |length|()})
   set theAddresses to (resultsArray's valueForKey:"addressComponents") as list
   set the end of foundAdresses to theAddresses
end repeat

There are so many address formats, trying to match them really is an act of supreme optimism.

You’ll need to use a repeat loop. Or a better match pattern (using |).

1 Like