You can use AppleScriptObjC to convert HTML into attributed strings. The result is similar to what you would get if you copied from Safari. here is a simple example:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions
-- classes, constants, and enums used
property NSAttributedString : a reference to current application's NSAttributedString
property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString
set theHTML to "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
<html>
<body>
<p>Some <b>HTML</b> text</p>
</body>
</html>"
set theHTML to NSString's stringWithString:theHTML
set theData to theHTML's dataUsingEncoding:NSUTF8StringEncoding
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
If you want to strip out all the HTML tags, you could add the following line to the code:
set theText to theATS's |string|() as text
If the HTML is in a file, you would use this:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions
-- classes, constants, and enums used
property |NSURL| : a reference to current application's |NSURL|
property NSData : a reference to current application's NSData
property NSAttributedString : a reference to current application's NSAttributedString
set pageURL to current application's |NSURL|'s URLWithString:"https://latenightsw.com/sd7/edit/"
set {theData, theError} to current application's NSData's dataWithContentsOfURL:pageURL options:0 |error|:(reference)
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not read HTML file"
And to build it directly from a URL, you could use this:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions
-- classes, constants, and enums used
property |NSURL| : a reference to current application's |NSURL|
property NSData : a reference to current application's NSData
property NSAttributedString : a reference to current application's NSAttributedString
set pageURL to current application's |NSURL|'s URLWithString:"https://latenightsw.com/sd7/edit/"
set {theData, theError} to current application's NSData's dataWithContentsOfURL:pageURL options:0 |error|:(reference)
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not interpret HTML"