Converting an NSAttributedString into an html string

How to convert an attributed string copied from a text editor to HTML?
I just need the bold, italic and break tags.
Here’s an example of what I’m looking for:

from

Les traitements suivants s’appliqueront aux
exercices ouverts à compter du 1er janvier 2018,

to

Les <i>traitements</i> suivants s’appliqueront aux<br>
exercices ouverts à compter du <b>1er janvier 2018</b>,

This will work with styled text on the clipboard:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

set pb to current application's NSPasteboard's generalPasteboard() -- get pasteboard
set theData to pb's dataForType:(current application's NSPasteboardTypeRTF) -- get rtf data off pasteboard
if theData = missing value then error "No rtf data found on clipboard"
-- make into attributed string
set theAttString to current application's NSAttributedString's alloc()'s initWithRTF:theData documentAttributes:(missing value)
set elementsToSkip to {"doctype", "html", "body", "xml", "style", "p", "font", "head", "span"} -- ammend to taste
set theDict to current application's NSDictionary's dictionaryWithObjects:{current application's NSHTMLTextDocumentType, elementsToSkip} forKeys:{current application's NSDocumentTypeDocumentAttribute, current application's NSExcludedElementsDocumentAttribute}
set {htmlData, theError} to theAttString's dataFromRange:{0, theAttString's |length|()} documentAttributes:theDict |error|:(reference)
if htmlData = missing value then error theError's localizedDescription() as text
set theString to current application's NSString's alloc()'s initWithData:htmlData encoding:(current application's NSUTF8StringEncoding)

The elementsToSkip list lets you strip out a lot of the noise.

1 Like

Perfect, as usual!
Than you Shane.
:slight_smile:

Thank you for this – very helpful to realise that we don’t have to go through textutil and the shell, and can prune out any HTML noise upstream.

A wrapping here FWIW, for reuse and composability:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

--  () -> [HTML tags to exclude] ->  Either Left(Error message) or Right(HTML)

-- htmlFromRTFClipExcept :: [String] -> Either String String
on htmlFromRTFClipExcept(exceptTags)
    set ca to current application
    set pb to ca's NSPasteboard's generalPasteboard()
    
    -- Either (Right) RTF data or (Left) message string
    if (pb's pasteboardItems()'s firstObject()'s types()'s containsObject:("public.rtf")) then
        set lrRTF to |Right|(ca's NSAttributedString's alloc()'s ¬
            initWithRTF:(pb's dataForType:("public.rtf")) ¬
                documentAttributes:(missing value))
    else
        set lrRTF to |Left|("No RTF text in clipboard")
    end if
    
    script htmlEither
        on |λ|(x)
            set {htmlData, err} to x's ¬
                dataFromRange:{location:0, |length|:x's |length|()} ¬
                    documentAttributes:¬
                    {DocumentType:"NSHTML", ExcludedElements:exceptTags} ¬
                        |error|:(reference)
            if err is missing value then
                |Right|((ca's NSString's alloc()'s ¬
                    initWithData:htmlData encoding:(ca's NSUTF8StringEncoding)) as text)
            else
                |Left|(err's localizedDescription() as text)
            end if
        end |λ|
    end script
    
    -- Either (Right) HTML string or (Left) message string
    bindEither(lrRTF, htmlEither)
end htmlFromRTFClipExcept

-- TEST --------------------------------------------------------------------------------
on run
    set lrHTML to ¬
        htmlFromRTFClipExcept({"doctype", "html", "body", ¬
            "xml", "style", "p", "font", "head", "span"})
    
    if isRight(lrHTML) then
        |Right| of lrHTML -- HTML string
    else
        |Left| of lrHTML -- Error message
    end if
end run


-- GENERIC FUNCTIONS ------------------------------------------------------------------

-- Left :: a -> Either a b
on |Left|(x)
    {type:"Either", |Left|:x, |Right|:missing value}
end |Left|

-- Right :: b -> Either a b
on |Right|(x)
    {type:"Either", |Left|:missing value, |Right|:x}
end |Right|

-- bindEither (>>=) :: Either a -> (a -> Either b) -> Either b
on bindEither(m, mf)
    if isRight(m) then
        mReturn(mf)'s |λ|(|Right| of m)
    else
        m
    end if
end bindEither

-- isLeft :: Either a b -> Bool
on isLeft(x)
    set dct to current application's ¬
        NSDictionary's dictionaryWithDictionary:x
    (dct's objectForKey:"type") as text = "Either" and ¬
        (dct's objectForKey:"Right") as list = {missing value}
end isLeft

-- isRight :: Either a b -> Bool
on isRight(x)
    set dct to current application's ¬
        NSDictionary's dictionaryWithDictionary:x
    (dct's objectForKey:"type") as text = "Either" and ¬
        (dct's objectForKey:"Left") as list = {missing value}
end isRight

-- Lift 2nd class handler function into 1st class script wrapper
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    if class of f is script then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn

Where a method returns an error by indirection, you cannot rely on the presence of an error object as an indication that the method has not succeeded. You must test the method’s direct result — in this case for missing value — and then deal with the error only if the direct result indicates one has been thrown.

I also suggest FP evangalism might better conducted in a thread of its own.

Presumably you mean evangelism ?

Not an intention – we all build things in different ways, and optimise for different contexts.

Thanks for the point about error trapping.

Thanks, Shane. This is very useful. :+1:

on |Left|(x)
{type:“Either”, |Left|:x, |Right|:missing value}
end |Left|

Well, that’s the first time in quite a while that I’ve read an AppleScript that I couldn’t make head nor tail of.

What’s the advantage of such cryptic code?

Probably none, if it looks cryptic :slight_smile:

More generally, option types (see for example):

allow construction with a kind of Lego brick which contains two channels:

  • a value channel for results which can be passed on to enclosing calls
  • a glitch channel which can just indicate whether everything has successfully returned a value so far, or hold a message detailing the point at which a value of the type required couldn’t be obtained.

In other words, just a slightly more composable alternative to the parallel channels of:

  • executing code, vs
  • run-time error

and the main advantage, in contexts where this happens to have any value, is to reduce the incidence of run-time errors.

I understand optionals; I use Swift a lot (I’ve somewhat grudgingly let it more or less replace Obj-C for all new projects). I still don’t understand the syntax you’re using here in AppleScript though; what’s with all the pipes and greek symbols?

Pff … I did type out the full name ‘lambda’ for anonymous functions for a while,
but after a while I found it just slightly long and noisy. I personally prefer a keyboard shortcut for |λ|.

Mileage varies :slight_smile:

Fair enough. I appreciate this is written in a way that is best for you, but presumably you didn’t post it here for your own sake but for that of others.

Code like this:

mReturn(mf)'s |λ|(|Right| of m)

is about as helpful as me telling someone to type an octothorpe at the beginning of a shell script. Sure, I could spend time explaining why that’s better than calling it a hash or a pound sign, but really we could all have used our time better if I’d just given instructions in the first place that would be readily understandable to the majority of people I’m intending to help.

1 Like

Very happy to explain if you like – my thought was just to offer something pasteable and reusable as a function – but perhaps we should respect Shane’s feeling that his thread risks being evandalised ?

A private message ? A different thread ?

1 Like

I’d be more than willing to learn how you came up with that kind of code, just because I’m interested in things that I don’t understand. Maybe you could start a new thread explaining how this Fp thing works within AS and why it might be worth adopting in certain (or any) circumstances.

1 Like

I’ll start another thread on the core pattern, which is just the use of map, filter and reduce/fold.

( Anything else can unfold from those three, I think )

1 Like

Sorry to add to this thread… I suppose FP stands for Functional Programming ? So are map, filter and reduce/fold just concepts of FP or actual functions that we have access to somehow in AS ?

FP/Functional Programming was the term which Shane invoked, perhaps in jest, and which is certainly thrown around a lot, but its not a term I personally take all that seriously.

map filter and fold are just useful functions which you can define (or get ready-baked off the shelf) in most languages now, and which just make life a bit easier.

(Pre-packaged and pre-tested loops essentially).

1 Like

Ok, the functions are actually in the other thread… Thank you !