Converting HTML to attributed strings

foundation
how-to
appkit
asobjc

(Mark Alldritt) #1

You can use AppleScriptObjC to convert HTML into attributed strings. The result is similar to what you would get if you copied from Safari. here is a simple example:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions

-- classes, constants, and enums used
property NSAttributedString : a reference to current application's NSAttributedString
property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString

set theHTML to "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
<html>
<body>
<p>Some <b>HTML</b> text</p>
</body>
</html>"

set theHTML to NSString's stringWithString:theHTML
set theData to theHTML's dataUsingEncoding:NSUTF8StringEncoding
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)

If you want to strip out all the HTML tags, you could add the following line to the code:

set theText to theATS's |string|() as text

If the HTML is in a file, you would use this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions

-- classes, constants, and enums used
property |NSURL| : a reference to current application's |NSURL|
property NSData : a reference to current application's NSData
property NSAttributedString : a reference to current application's NSAttributedString

set pageURL to current application's |NSURL|'s URLWithString:"https://latenightsw.com/sd7/edit/"
set {theData, theError} to current application's NSData's dataWithContentsOfURL:pageURL options:0 |error|:(reference)
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not read HTML file"

And to build it directly from a URL, you could use this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit" -- needed for initWithHTML:
use scripting additions

-- classes, constants, and enums used
property |NSURL| : a reference to current application's |NSURL|
property NSData : a reference to current application's NSData
property NSAttributedString : a reference to current application's NSAttributedString

set pageURL to current application's |NSURL|'s URLWithString:"https://latenightsw.com/sd7/edit/"
set {theData, theError} to current application's NSData's dataWithContentsOfURL:pageURL options:0 |error|:(reference)
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not interpret HTML"

(Jim Underwood) #2

Thanks for sharing this, Mark. :+1:

Could you please remind us ASObjC neophytes (like me) of the things we can then do with attributed strings? Links to examples would be a big plus.


(Shane Stanley) #3

There are really only a couple of practical things you can do with them: save them to file, generally as RTF, and put them on the clipboard. There are examples of code to do both of these in several posts around here.