Extract numbers from String

My goal is to extract an array of three numbers from a string, in the following script.

use framework "Foundation"
use scripting additions
set stringContainingfNumbers to "39 apples found. Showing 1 - 20 sorted by name"
set anNSStringContainingfNumbers to current application's NSString's stringWithString:stringContainingNumbers
set anNSArrayContainingNumbers to ((anNSStringContainingNumbers's componentsSeparatedByString:" ")'s valueForKey:"intValue")

The script, partially succeeded but yielded a series of unwanted zeros.
=> (NSArray) {39, 0, 0, 0, 1, 0, 20, 0, 0, 0}

I tried to sort the array using a predicate to remove the zeros from the array with the following lines, but it erred.

set thePred to {current application's NSPredicate's predicateWithFormat:"intValue BETWEEN {1, 9}"}
set result to (anNSArrayContainingNumbers's filteredArrayUsingPredicate:thePred)

=>…unrecognized selector sent to instance …

What might be a correct method to extract numbers from a string?

One approach might be to lex the string, grouping runs of numeric characters together:

-------------- LIST OF NUMBERS FOUND IN STRING -------------

-- numbersInString :: String -> [Number]
on numbersInString(s)
    
    script numericOrOther
        on |λ|(a, b)
            isNumeric(a) = isNumeric(b)
        end |λ|
    end script
    
    script numbersOnly
        on |λ|(cs)
            set w to concat(cs)
            if isNaN(w) then
                {}
            else
                {w as number}
            end if
        end |λ|
    end script
    
    concatMap(numbersOnly, ¬
        groupBy(numericOrOther, characters of s))
end numbersInString


---------------------------- TEST --------------------------
on run
    set sample to "39 apples found. Showing 1 - 20 sorted by name"
    
    numbersInString(sample)
    
    --> {39, 1, 20}
end run



-------------------------- GENERIC -------------------------
-- https://github.com/RobTrew/prelude-jxa


-- concat :: [[a]] -> [a]
-- concat :: [String] -> String
on concat(xs)
    set lng to length of xs
    if 0 < lng and string is class of (item 1 of xs) then
        set acc to ""
    else
        set acc to {}
    end if
    repeat with i from 1 to lng
        set acc to acc & item i of xs
    end repeat
    acc
end concat


-- concatMap :: (a -> [b]) -> [a] -> [b]
on concatMap(f, xs)
    set lng to length of xs
    set acc to {}
    tell mReturn(f)
        repeat with i from 1 to lng
            set acc to acc & (|λ|(item i of xs, i, xs))
        end repeat
    end tell
    return acc
end concatMap


-- foldl :: (a -> b -> a) -> a -> [b] -> a
on foldl(f, startValue, xs)
    tell mReturn(f)
        set v to startValue
        set lng to length of xs
        repeat with i from 1 to lng
            set v to |λ|(v, item i of xs, i, xs)
        end repeat
        return v
    end tell
end foldl


-- Typical usage: groupBy(on(eq, f), xs)
-- groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
on groupBy(f, xs)
    set mf to mReturn(f)
    
    script enGroup
        on |λ|(a, x)
            if length of (active of a) > 0 then
                set h to item 1 of active of a
            else
                set h to missing value
            end if
            
            if h is not missing value and mf's |λ|(h, x) then
                {active:(active of a) & {x}, sofar:sofar of a}
            else
                {active:{x}, sofar:(sofar of a) & {active of a}}
            end if
        end |λ|
    end script
    
    if length of xs > 0 then
        set dct to foldl(enGroup, {active:{item 1 of xs}, sofar:{}}, rest of xs)
        if length of (active of dct) > 0 then
            sofar of dct & {active of dct}
        else
            sofar of dct
        end if
    else
        {}
    end if
end groupBy


-- isNumeric :: Char -> Bool
on isNumeric(c)
    set n to (id of c)
    (48 ≤ n and 57 ≥ n) or ("-." contains c)
end isNumeric


-- isNaN :: String -> Bool
on isNaN(s)
    try
        s as number
        false
    on error
        true
    end try
end isNaN


-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    -- 2nd class handler function lifted into 1st class script wrapper. 
    if script is class of f then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn
2 Likes

You could use a regular expression search. And you could use a string scanner:

use framework "Foundation"
use scripting additions
set stringContainingNumbers to "39 apples found. Showing 1 - 20 sorted by name"
set theScanner to current application's NSScanner's scannerWithString:stringContainingNumbers
set digits to current application's NSCharacterSet's decimalDigitCharacterSet()
set theNums to {}
repeat
	theScanner's scanUpToCharactersFromSet:digits intoString:(missing value)
	set {theResult, theValue} to theScanner's scanInteger:(reference)
	if theResult is false then exit repeat -- no more
	set end of theNums to theValue as integer
end repeat
return theNums
1 Like

This is quite similar to scriptingmd’s own code:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

set stringContainingNumbers to "39 apples found. Showing 1 - 20 sorted by name"
set NSStringContainingNumbers to current application's NSString's stringWithString:(stringContainingNumbers)
set nonDigits to current application's NSCharacterSet's decimalDigitCharacterSet()'s invertedSet()
set numsEtc to (NSStringContainingNumbers's componentsSeparatedByCharactersInSet:(nonDigits))'s mutableCopy()
tell numsEtc to removeObject:("")
set theNums to (numsEtc's valueForKey:("integerValue")) as list

I’ve used something like this:

set stringContainingNumbers to "39 apples found. ¬
Showing 1 - 20 sorted by name. ¬
Big numbers like 1,234. ¬
Also decimals like 1.234"

set wordsFromString to words of stringContainingNumbers
set numbersInString to {}
repeat with thisWord in wordsFromString
   try
      set the end of numbersInString to thisWord as number
   end try
end repeat
return numbersInString

–>{39, 1, 20, 1234, 1.234}

1 Like

I appreciate all of the responses on multiple methods to extract a number from a string, whether by
• Lexing a string and grouping runs of numeric characters together
• Scanning a string for a string containing numbers
• Removing the inverted string’s set of decimal digits
• Setting a list to string’s words, identifiable as a number.

When I reviewed my original post as to the reasons for its predicate method failing, I found three errors:

1• Erroneously inserting an extra character “f” into some but not every string variable between the words “Containing” and “Numbers”, which I have removed
2• Erroneously bracketing with curly braces {} instead of using syntactically correct parentheses () when setting thePred variable, which I have replaced
3• Overly restricting the range of Objective C’s Between comparison parser, which I have changed from {1,9} to {1,100}

With those corrections, the following predicate script now finds a number in a string.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

set stringContainingNumbers to "39 apples found. Showing 1 - 20 sorted by name"
set anNSStringContainingNumbers to current application's NSString's stringWithString:stringContainingNumbers
set anNSArrayContainingNumbers to ((anNSStringContainingNumbers's componentsSeparatedByString:" ")'s valueForKey:"intValue")
set thePred to (current application's NSPredicate's predicateWithFormat:"intValue BETWEEN {1, 100}")
set result to (anNSArrayContainingNumbers's filteredArrayUsingPredicate:thePred) as list

In the interest of understanding this process, I welcome any further insights to, or critiques of my script.

It’s only finding the integer values in a string, and discarding any decimal values (not rounding up when appropriate). Plus if a number is written as 1,234, it doesn’t work. But since you’re limiting the values to 100 that seems to be fine for your purposes.

Extracting numbers from words in the string is the only method to reliably get the correct result from this input:

set stringContainingNumbers to "39 apples found. 
Showing 1 - 20 sorted by name. 
Big numbers like 1234. 
Or Written this way 1,234
Also decimals like 1.234
Or 9.876"

-->>{39, 1, 20, 1234, 1234, 1.234, 9.876}

Prove me wrong!
:wink:

I think you’re solving a more general problem — there was no indication that the OP was also after reals. And as a general solution, it has a fairly serious failing: it doesn’t handle negative values. It will also fail in locales that use different decimal and grouping separators.

Right, none of the solutions handle negative numbers.

As for the decimal and grouping separators, is that something appleScript gets from the user’s localization or is it the same everywhere?

(For my purposes neither of those issues would come up. I’m extracting numbers from movie and tv show titles.)

The first one does.

Try it, for example, with

set sample to "-3.9 apples found. Showing -1 - -20 sorted by name"

Yes. Honestly, I think this is a case where one size can’t fit all (and risks being needlessly complicated trying). But I suspect a regular-expressions-based solution would be most efficient.