HTML Styled text in a variable

This should be pretty simple. I have text with some very basic HTML tags.

<p><b>The Tonight Show Starring Jimmy Fallon</b> Milo Ventimiglia; journalist Guy Raz; comic Carmen Lynch. (N) 11:34 p.m. KNBC</p>

I want to convert that to styled text that I can paste into our CMS app with styles intact. (It used to accept the raw HTML, but doesn’t anymore and they’re not planning to fix it).

In SD, the text displays in the Best window perfectly, and if I copy that and paste it, that works.

I can also save the text in an html document, open the document in safari, copy from there and paste it and it works (although when doing that from Chrome it loses all of the bold and headline tags).

Any suggestions?

Where do you have the HTML? In a variable, or on the clipboard?

There are at least two methods to convert HTML code to Rich Text, and put on the Clipboard.

  1. Shell script
  2. ASObjC

Here is my shell script handler I have been using for years:

on pasteHTMLasRTFtoClipboard(pstrHTML)
  
  -- REWRITTEN AS RTF AND COPIED TO THE CLIPBOARD
  set lstrCMD to "echo " & quoted form of pstrHTML & " | textutil -format html -convert rtf -stdin -stdout | pbcopy -Prefer rtf"
  do shell script lstrCMD
  
end pasteHTMLasRTFtoClipboard

Here is a script that @ShaneStanley wrote in 2018 that puts both Rich Text and plain text on the clipboard:

property ptyScriptName : "Convert HTML to RTF and Set Clipboard"
property ptyScriptVer : "1.0"
property ptyScriptDate : "2018-05-28"
property ptyScriptAuthor : "ShaneStanley"

(*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PURPOSE:
  • How Do I Set Clipboard (Pasteboard) to Both Rich Text (RTF) and Plain Text?
  
REF:  The following were used in some way in the writing of this script.

  1.  2018-05-28, ShaneStanley, Late Night Software Ltd.
      How Do I Set Clipboard (Pasteboard) to Both Rich Text (RTF) and Plain Text?
      http://forum.latenightsw.com/t/how-do-i-set-clipboard-pasteboard-to-both-rich-text-rtf-and-plain-text/1189/5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*)
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- classes, constants, and enums used
property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString
property NSRTFTextDocumentType : a reference to current application's NSRTFTextDocumentType
property NSPasteboardTypeRTF : a reference to current application's NSPasteboardTypeRTF
property NSPasteboardTypeString : a reference to current application's NSPasteboardTypeString
property NSDictionary : a reference to current application's NSDictionary
property NSAttributedString : a reference to current application's NSAttributedString
property |NSURL| : a reference to current application's |NSURL|

set someHTML to "Click <b>here</b>"
-- convert to data
set htmlString to NSString's stringWithString:someHTML
set htmlData to htmlString's dataUsingEncoding:NSUTF8StringEncoding
-- make attributed string
set attString to NSAttributedString's alloc()'s initWithHTML:htmlData documentAttributes:(missing value)
-- need it in RTF data form for clipboard
set rtfData to attString's RTFFromRange:{0, attString's |length|()} documentAttributes:{DocumentType:NSRTFTextDocumentType}
set pb to current application's NSPasteboard's generalPasteboard() -- get pasteboard
pb's clearContents()
-- set both types for the first object on the clipboard
pb's setData:rtfData forType:NSPasteboardTypeRTF

FWIW, it turns out what I posted then can be simplified a bit:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- classes, constants, and enums used
property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString
property NSAttributedString : a reference to current application's NSAttributedString

set someHTML to "Click <b>here</b>"
-- convert to data
set htmlString to NSString's stringWithString:someHTML
set htmlData to htmlString's dataUsingEncoding:NSUTF8StringEncoding
-- make attributed string
set attString to NSAttributedString's alloc()'s initWithHTML:htmlData documentAttributes:(missing value)
set pb to current application's NSPasteboard's generalPasteboard() -- get pasteboard
pb's clearContents()
pb's writeObjects:{attString}

And if the HTML is on the clipboard rather than in a variable, it’s simpler again:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- classes, constants, and enums used
property NSPasteboardTypeString : a reference to current application's NSPasteboardTypeString
property NSAttributedString : a reference to current application's NSAttributedString

-- put some HTML on for testing
set the clipboard to "Click <b>here</b>"

set pb to current application's NSPasteboard's generalPasteboard() -- get pasteboard
set htmlData to pb's dataForType:NSPasteboardTypeString -- get data off pasteboard
-- make attributed string
set attString to NSAttributedString's alloc()'s initWithHTML:htmlData documentAttributes:(missing value)
pb's clearContents()
pb's writeObjects:{attString}
1 Like

Thanks, guys, that worked like a charm!

There is one issue with the script. It doesn’t handle diacriticals well.

é becomes é

Any suggestions? (I’m using the last version Shane posted)

They need to be encoded in your HTML. You may be able to get away with something this simple:

use script "RegexAndStuffLib" version "1.0.6"

-- put some HTML on for testing
set the clipboard to (hex encode "Click <b>hére</b>")

I’m trying these for another purpose and I’m wondering if there’s a way to get the plain text into a BBEdit (or other text editor) document; or the styled text into a word document, without using the clipboard.

The issue is that Netflix has changed it’s pages so they’re the kind where you can’t select text. The text is there and you can pull it out using source or inspect.

I’ve adapted the above script to get the source from the Safari tab and put it on the clipboard as styled text, then put the contents of the clipboard into a BBEdit document.

The script below works, but I’d prefer to not use the clipboard.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions
use script "RegexAndStuffLib" version "1.0.6"

property NSUTF8StringEncoding : a reference to 4
property NSString : a reference to current application's NSString
property NSAttributedString : a reference to current application's NSAttributedString

tell application "Safari"
   tell window 1
      tell tab 1
         --set URL to "https://www.netflix.com/70155589" --"https://www.netflix.com/title/70155589"
         set someHTML to source
      end tell
   end tell
end tell
--my ConvertHTML(myHTML)

--on ConvertHTML(someHTML) -- convert to data
set the someHTML to (hex encode someHTML)
set htmlString to NSString's stringWithString:someHTML
set htmlData to htmlString's dataUsingEncoding:NSUTF8StringEncoding
set attString to NSAttributedString's alloc()'s initWithHTML:htmlData documentAttributes:(missing value)

set pb to current application's NSPasteboard's generalPasteboard() -- get pasteboard
pb's clearContents()
pb's writeObjects:{attString}
--end ConvertHTML


tell application "BBEdit"
   make new window at beginning
   set text of window 1 to the clipboard as text
end tell

To get the plain text, use:

set theText to attString's |string|() as text

@ShaneStanley, if I wanted to put BOTH the rich text and the plain text on the Clipboard, how would I modify this statement:

pb's writeObjects:{attString}

Thanks.

Just leave it as it is — the clipboard will then offer whichever the app wanting to paste requests.

That’s the principal of the clipboard. You put the richest expression on it, and it does the conversion when a client wants a less-rich version.

Those who has problems with diacriticals or Cyrillic chars should change encoding from 4 to 10 in property NSUTF8StringEncoding : a reference to 4 line

I want to use this script to make the link from the selected text and the link address I have, so that selected text become the name for this link. The problem is that when you insert generated RTF link it doesn’t match the format of the context (font face, font size and etc.)

Is it possible to somehow apply the format of the selected text (which was copied to the pasteboard to be the name of the link) to the resulting link?

Something like this should get you started:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"
use framework "AppKit" -- for pasteboard

-- get the clipboard
set pb to current application's NSPasteboard's generalPasteboard()
-- get any attributed strings off clipboard
set allATS to (pb's readObjectsForClasses:{current application's NSAttributedString} options:(missing value))
if allATS's |count|() = 0 then error "No rich text found on the clipboard"
-- make an editable copy of the first item of the array so we can modify it
set richText to allATS's firstObject()'s mutableCopy()
-- build the link URL
set linkString to richText's |string|()'s stringByTrimmingCharactersInSet:(current application's NSCharacterSet's whitespaceAndNewlineCharacterSet())
set linkURL to current application's NSURL's URLWithString:linkString
-- make dictionary of attributes: the NSURL, blue text, single underline
set attsDict to current application's NSDictionary's dictionaryWithObjects:{linkURL, current application's NSNumber's numberWithUnsignedInteger:(current application's NSSingleUnderlineStyle), (current application's NSColor's blueColor())} forKeys:{current application's NSLinkAttributeName, current application's NSUnderlineStyleAttributeName, current application's NSForegroundColorAttributeName}
-- add link and replace clipboard
richText's addAttributes:attsDict range:{0, richText's |length|()}
pb's clearContents()
pb's writeObjects:{richText}

Thanks a lot @ShaneStanley, script works great!
Already combined it with Keyboard Maestro, and paste links right behind the selected text everywhere, keeping the formatting of the context! ))

The only problem is with the MS Word… It does copy RTF, but substitutes any font for Helvetica somewhen during getting att string from the pasteboard… Strange bug of Microsoft

Is there any ASObjC magic allowing to copy RTF format under the cursor position to the pasteboard, with or without text string?