Converting HTML to attributed strings

You can use AppleScriptObjC to convert HTML into attributed strings. The result is similar to what you would get if you copied from Safari. here is a simple example:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions

-- classes, constants, and enums used
property NSAttributedString : a reference to current application's NSAttributedString
property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString

set theHTML to "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
<html>
<body>
<p>Some <b>HTML</b> text</p>
</body>
</html>"

set theHTML to NSString's stringWithString:theHTML
set theData to theHTML's dataUsingEncoding:NSUTF8StringEncoding
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)

If you want to strip out all the HTML tags, you could add the following line to the code:

set theText to theATS's |string|() as text

If the HTML is in a file, you would use this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions

-- classes, constants, and enums used
property |NSURL| : a reference to current application's |NSURL|
property NSData : a reference to current application's NSData
property NSAttributedString : a reference to current application's NSAttributedString

set pageURL to current application's |NSURL|'s URLWithString:"https://latenightsw.com/sd7/edit/"
set {theData, theError} to current application's NSData's dataWithContentsOfURL:pageURL options:0 |error|:(reference)
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not read HTML file"

And to build it directly from a URL, you could use this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions

-- classes, constants, and enums used
property |NSURL| : a reference to current application's |NSURL|
property NSData : a reference to current application's NSData
property NSAttributedString : a reference to current application's NSAttributedString

set pageURL to current application's |NSURL|'s URLWithString:"https://latenightsw.com/sd7/edit/"
set {theData, theError} to current application's NSData's dataWithContentsOfURL:pageURL options:0 |error|:(reference)
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not interpret HTML"
3 Likes

Thanks for sharing this, Mark. :+1:

Could you please remind us ASObjC neophytes (like me) of the things we can then do with attributed strings? Links to examples would be a big plus.

There are really only a couple of practical things you can do with them: save them to file, generally as RTF, and put them on the clipboard. There are examples of code to do both of these in several posts around here.

Thanks Mark!

I really needed to do this the other day and wanted to use AppleScriptObjC instead of the textutil in the shell.

Very handy indeed.