I agree with that, but I have found that using JavaScript injection in the Browser (Safari or Chrome) is easier and more effective.
There are a variety of JavaScript tools to find/select the HTML of interest.
There are the traditional getElementsBy
I prefer document.querySelector
and use of XPath.
and more.
A huge advantage is that you can use Chrome “Inspect” tool on the text/object of interest on the web page, and then use the JavaScript Console to develop/test the JavaScript statements you need to use. The Chrome JavaScript Console provides great support, including very intelligent, easy-to-use autocompletion and debugging.
Heh - several years = 2011! Time flies…
I didn’t really ever do anything with this, except play around with one Ruby script for grins, but I think what I am suggesting in my comment on the Macscripter article is very similar to what Jim is suggesting. I also noticed that the link I included in that comment is way dead…now that seems to be a jewelry blog or something! Fortunately, I did save the tutorial from the link in a PDF showing how to use Nokogiri to web scrape, if you want to look at it: Web Scraping with Nokogiri
Here’s a script that Shane and Jim posted here a while back that I’ve been using for downloading web pages to scrape. This version get text from market watch that you can parse to get the closing value of the specific stock or index.
set pageURLStr to "https://www.marketwatch.com/investing/stock/aapl"
(*
PURPOSE: Get Web Page HTML using ASObjC
(as an alternate to curl)
REF: Script posted by @ShaneStanley to ASUL, 2017-03-31
https://lists.apple.com/archives/applescript-users/2017/Mar/msg00421.html
*)
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
--- SET URL OF WEB PAGE ---
set pageURLnsStr to current application's NSString's stringWithString:pageURLStr
set nsPageURL to current application's NSURL's URLWithString:pageURLnsStr
--- GET WEB PAGE HTML ---
set {nsPageHTML, theError} to current application's NSData's dataWithContentsOfURL:nsPageURL options:0 |error|:(reference)
if nsPageHTML = missing value then error (theError's localizedDescription() as text)
-- convert to XML
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithData:nsPageHTML options:(current application's NSXMLDocumentTidyHTML) |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
--- SEARCHING & EXTRACTING INFO FROM WEB PAGE ---
-- • As an alternate to JavaScript in the Browser, use ASObjC XML methods
-- • For an example, see https://lists.apple.com/archives/applescript-users/2017/Mar/msg00421.html
---------------------------------------------------
-- CONVERT TO NORMAL TEXT
-- • There are several options to choose from
-- SEE: Writing XML From NSXML Objects
-- https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/NSXML_Concepts/Articles/WritingXML.html
--------------------------------------------------
-- 1. SIMPLE
set htmlSimpleStr to theXMLDoc's XMLString() as text
-- 2. TIDY
set htmlTidyStr to (theXMLDoc's XMLStringWithOptions:(current application's NSXMLDocumentTidyHTML)) as text
-- (I don't see any difference between #1 and #2)
-- 3. PRETTY PRINT (produces a very readable XML/HTML output)
set htmlPPStr to (theXMLDoc's XMLStringWithOptions:(current application's NSXMLNodePrettyPrint)) as text
Defining a XPath can be challenging, but in this case it is straight-forward.
All of the HTML meta tags are in the <header>. . .</header> block. So with the web page open in Chrome, press ⌘⌥I to open the Chrome Dev Tools showing the HTML elements of the page. Expand the <header> block to view its elements.
You can either visually scan for the meta tag of interest, or use Find (⌘F), in this case for “price”. It is the second occurance found:
While “name” and “content” are the most common key/value pairs, sometimes other attributes are used. So you may need to adjust accordingly.
While there is very little on the web about using XPath with ASObjC, there is a lot about using it with JavaScript. So you may need to search for JavaScript examples, then translate them to ASObjC, if that’s your preferred tool.
The nodesForXPath:error: method simply returns an array of matching NSXMLNodes. In this case firstObject() gets the first (and only in this case) node in the array, and stringValue() is how you extract the string from an attribute node.
Anybody know if MarketWatch has an API where I can just ask for closing change values? I’m currently just typing the thing into Numbers. Any other site that has a stock API? Do I really have to screen-scrape for this? Maybe I should open up the macOS Stock widget and see what it does. My funds update at 6:15 PM, so I just go to Dashboard and see the 3 closing changes and switch to Numbers and type them in. Surely if someone writes a Stock app they don’t screen-scrape, do they?
The version that Shane wrote doesn’t screen scrape. It reads the HTML served on the site and parses that to extract your data.
If they had an API, it would presumably just read the same XML and do something similar.
If you have a list of closing prices your need, paste the stock names and I can see they work.
Here is the most simplified version. The HTML never displays in a browser.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
--- SET URL OF WEB PAGE ---
set pageURLStr to "https://www.marketwatch.com/investing/stock/aapl"
set pageURLnsStr to current application's NSString's stringWithString:pageURLStr
set nsPageURL to current application's NSURL's URLWithString:pageURLnsStr
--- GET WEB PAGE HTML ---
set {nsPageHTML, theError} to current application's NSData's dataWithContentsOfURL:nsPageURL options:0 |error|:(reference)
if nsPageHTML = missing value then error (theError's localizedDescription() as text)
-- convert to XML
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithData:nsPageHTML options:(current application's NSXMLDocumentTidyHTML) |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
-- extract the price
set thePrice to (theXMLDoc's nodesForXPath:"//meta[@name='price']/attribute::content" |error|:(missing value))'s firstObject()'s stringValue() as text
Thanks so much for clarifying. That works just fine, Ed. I got the closing price. Do you know what the name is for the change from previous close? I might as well just get that because then the subtraction is already done.
The funds are QREARX, TIQRX, and TLIRX. For today’s changes you should get -0.05, -0.02, and 0.01. For prices you should get 400.69, 20.35, and 11.70.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set urlRoot to "https://www.marketwatch.com/investing/stock/"
set companyIDs to {"QREARX", "TIQRX", "TLIRX"}
set allPrices to {}
repeat with thisCompany in companyIDs
--- SET URL OF WEB PAGE ---
set pageURLStr to urlRoot & thisCompany as text
set the end of allPrices to GetChangeInPrice(pageURLStr)
end repeat
set AppleScript's text item delimiters to {return & return}
return allPrices as text
on GetChangeInPrice(pageURLStr)
set pageURLnsStr to current application's NSString's stringWithString:pageURLStr
set nsPageURL to current application's NSURL's URLWithString:pageURLnsStr
--- GET WEB PAGE HTML ---
set {nsPageHTML, theError} to current application's NSData's dataWithContentsOfURL:nsPageURL options:0 |error|:(reference)
if nsPageHTML = missing value then error (theError's localizedDescription() as text)
-- convert to XML
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithData:nsPageHTML options:(current application's NSXMLDocumentTidyHTML) |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
-- extract the information
--"name\"
--"price\"
--"priceChange\"
--"priceChangePercent\"
set thePrice to (theXMLDoc's nodesForXPath:"//meta[@name='price']/attribute::content" |error|:(missing value))'s firstObject()'s stringValue() as text
set theName to (theXMLDoc's nodesForXPath:"//meta[@name='name']/attribute::content" |error|:(missing value))'s firstObject()'s stringValue() as text
set thePriceChange to (theXMLDoc's nodesForXPath:"//meta[@name='priceChange']/attribute::content" |error|:(missing value))'s firstObject()'s stringValue() as text
set thePriceChangePercent to (theXMLDoc's nodesForXPath:"//meta[@name='priceChangePercent']/attribute::content" |error|:(missing value))'s firstObject()'s stringValue() as text
set theinfo to {theName, thePrice, thePriceChange, thePriceChangePercent}
set AppleScript's text item delimiters to {tab}
return theinfo as text
end GetChangeInPrice
Thanks so much. I’ll incorporate it, now that I understand it. My confusion arose because with the Yahoo! API it seemed that it returned an array with one web call. I did have to massage it a bit. But now I have this new source of quotes. Thanks again.
How can I do this from the source property of a Safari document? If I try to read the HTLM directly the web site generates the sign-on text, even when I’m signed on. But I can see the info I need in the pages source in Safari.
So how do I convert the text from the source that contains XML to an XMLDoc, where I can use nodesForXPath?
tell application id "com.apple.Safari" -- Safari
set theSource to source of document 1
end tell
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithXMLString:theSource options:(current application's NSXMLDocumentTidyHTML) |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
I tried your code on a mid-size web page (codemunki.com/LufkinTools/index.html) and found that the returned XMLDoc is cut short and has this message appended to the end:
<<description truncated at 2000 characters out of 10635>>
Is the XMLDoc actually being truncated or is the result clipped only for display purposes? If the truncation is actually happening, can it be disabled?
It’s for display (and performance) purposes. You can change the amount of truncation using the expert preference key PrefDescriptionOtherLimit, which defaults to 2000. I’ve updated the entry here to cover it:
A bug in earlier Script Debugger 6 versions meant the setting did nothing, but it should work as described in both the latest release of 6 as well as 7.
Script Debugger uses various methods to display Cocoa objects. In many cases what you see is the result of calling the debugDescription method on the object. Depending on the class, the result of this can vary from very terse to quite detailed. In some common cases we clean it up quite a bit, to make it appear more AppleScript-like. In the case of NSXMLDocument, it is the full text of XML.
If you want to see the full description raw, you can also call the method yourself. For example:
log (theXMLDoc's |description|() as text)
For occasional use, this is probably preferable than setting the default to large number. But some descriptions are very long.