Converting attributed strings to HTML

foundation
how-to
asobjc

(Mark Alldritt) #1

You can convert an attributed string to HTML. The result is similar to what you get when you save a file to .html format in TextEdit.

In this example the attributed string is coming from the clipboard, and is then converted to data. The key to producing HTML is the documentAttributes: parameter – you have to specify NSHTMLTextDocumentType as the document type used when creating the data.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- classes, constants, and enums used
property NSUTF8StringEncoding : a reference to 4
property NSPasteboard : a reference to current application's NSPasteboard
property NSPasteboardTypeRTF : a reference to current application's NSPasteboardTypeRTF
property NSAttributedString : a reference to current application's NSAttributedString
property NSString : a reference to current application's NSString
property NSHTMLTextDocumentType : a reference to current application's NSHTMLTextDocumentType

set pb to NSPasteboard's generalPasteboard() -- get pasteboard
set theData to pb's dataForType:(NSPasteboardTypeRTF) -- get rtfd data off pasteboard
if theData = missing value then error "No rtf data found on clipboard"
-- make into attributed string
set theAttString to NSAttributedString's alloc()'s initWithRTF:theData documentAttributes:(missing value)

set {htmlData, theError} to theAttString's dataFromRange:{0, theAttString's |length|()} documentAttributes:{DocumentType:NSHTMLTextDocumentType} |error|:(reference)
if htmlData = missing value then error theError's localizedDescription() as text
set theString to (NSString's alloc()'s initWithData:htmlData encoding:NSUTF8StringEncoding) as text

You could also save the data directly to a file, using one of the writeToFile: methods.

Just as when saving from TextEdit, this can produce some very verbose HTML. You can reduce the complexity by specifying a list of elements to be skipped. Again, the documentAttributes: parameter is where you specify this. Here is an example that strips out many of the standard elements:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- classes, constants, and enums used
property NSDocumentTypeDocumentAttribute : a reference to current application's NSDocumentTypeDocumentAttribute
property NSExcludedElementsDocumentAttribute : a reference to current application's NSExcludedElementsDocumentAttribute
property NSUTF8StringEncoding : a reference to 4
property NSPasteboard : a reference to current application's NSPasteboard
property NSPasteboardTypeRTF : a reference to current application's NSPasteboardTypeRTF
property NSAttributedString : a reference to current application's NSAttributedString
property NSString : a reference to current application's NSString
property NSHTMLTextDocumentType : a reference to current application's NSHTMLTextDocumentType

set pb to NSPasteboard's generalPasteboard() -- get pasteboard
set theData to pb's dataForType:(NSPasteboardTypeRTF) -- get rtfd data off pasteboard
if theData = missing value then error "No rtf data found on clipboard"
-- make into attributed string
set theAttString to NSAttributedString's alloc()'s initWithRTF:theData documentAttributes:(missing value)

set elementsToSkip to {"doctype", "html", "body", "xml", "style", "p", "font", "head", "span"} -- ammend to suite
set theDict to current application's NSDictionary's dictionaryWithObjects:{current application's NSHTMLTextDocumentType, elementsToSkip} forKeys:{NSDocumentTypeDocumentAttribute, NSExcludedElementsDocumentAttribute}
set {htmlData, theError} to theAttString's dataFromRange:{0, theAttString's |length|()} documentAttributes:theDict |error|:(reference)
if htmlData = missing value then error theError's localizedDescription() as text
set theString to (NSString's alloc()'s initWithData:htmlData encoding:NSUTF8StringEncoding) as text

(Andreas Kiel) #2

Thanks for this very helpful script.
Was looking for something like this for a long time.

Is there a way to do just the other way round.

Here is what I want to do:
I’ve a list (or data base) which contains subtitles.
Each item contains time, subtitle, …
Subtitle could be transformed to SRT subtitle format which looks like below example:

<font color="#FFF800">We’ll kick their ass.</font>
<font color="#00F400"><i>You see?</i></font>

If that can be “pre-converted” to proper HTML (which shouldn’t be a problem) and then to RTF the value could be passed to a text view. There the user can modify and the value can converted from there back using your script.


(Shane Stanley) #3

See here:


(Andreas Kiel) #4

Thanks Shane,

But to be honest your reply - at least at the first glimpse - doesn’t help me to understand.


(Shane Stanley) #5

You need to turn your HTML string into data, and then convert that. Like this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- classes, constants, and enums used
property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString
property NSAttributedString : a reference to current application's NSAttributedString

set theText to "<font color=\"#FFF800\">We'll kick their ass.</font>
<font color=\"#00F400\"><i>You see?</i></font>"
set theText to NSString's stringWithString:theText
set theData to theText's dataUsingEncoding:NSUTF8StringEncoding
if theData = missing value then error (theError's localizedDescription() as text)
set theATS to NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
if theATS is missing value then error "Could not interpret HTML"
return theATS

(Andreas Kiel) #6

Thanks!
That works perfect.