How to convert an attributed string copied from a text editor to HTML?
I just need the bold, italic and break tags.
Here’s an example of what I’m looking for:
from
Les traitements suivants s’appliqueront aux
exercices ouverts à compter du 1er janvier 2018,
to
Les <i>traitements</i> suivants s’appliqueront aux<br>
exercices ouverts à compter du <b>1er janvier 2018</b>,
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions
set pb to current application's NSPasteboard's generalPasteboard() -- get pasteboard
set theData to pb's dataForType:(current application's NSPasteboardTypeRTF) -- get rtf data off pasteboard
if theData = missing value then error "No rtf data found on clipboard"
-- make into attributed string
set theAttString to current application's NSAttributedString's alloc()'s initWithRTF:theData documentAttributes:(missing value)
set elementsToSkip to {"doctype", "html", "body", "xml", "style", "p", "font", "head", "span"} -- ammend to taste
set theDict to current application's NSDictionary's dictionaryWithObjects:{current application's NSHTMLTextDocumentType, elementsToSkip} forKeys:{current application's NSDocumentTypeDocumentAttribute, current application's NSExcludedElementsDocumentAttribute}
set {htmlData, theError} to theAttString's dataFromRange:{0, theAttString's |length|()} documentAttributes:theDict |error|:(reference)
if htmlData = missing value then error theError's localizedDescription() as text
set theString to current application's NSString's alloc()'s initWithData:htmlData encoding:(current application's NSUTF8StringEncoding)
The elementsToSkip list lets you strip out a lot of the noise.
Thank you for this – very helpful to realise that we don’t have to go through textutil and the shell, and can prune out any HTML noise upstream.
A wrapping here FWIW, for reuse and composability:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions
-- () -> [HTML tags to exclude] -> Either Left(Error message) or Right(HTML)
-- htmlFromRTFClipExcept :: [String] -> Either String String
on htmlFromRTFClipExcept(exceptTags)
set ca to current application
set pb to ca's NSPasteboard's generalPasteboard()
-- Either (Right) RTF data or (Left) message string
if (pb's pasteboardItems()'s firstObject()'s types()'s containsObject:("public.rtf")) then
set lrRTF to |Right|(ca's NSAttributedString's alloc()'s ¬
initWithRTF:(pb's dataForType:("public.rtf")) ¬
documentAttributes:(missing value))
else
set lrRTF to |Left|("No RTF text in clipboard")
end if
script htmlEither
on |λ|(x)
set {htmlData, err} to x's ¬
dataFromRange:{location:0, |length|:x's |length|()} ¬
documentAttributes:¬
{DocumentType:"NSHTML", ExcludedElements:exceptTags} ¬
|error|:(reference)
if err is missing value then
|Right|((ca's NSString's alloc()'s ¬
initWithData:htmlData encoding:(ca's NSUTF8StringEncoding)) as text)
else
|Left|(err's localizedDescription() as text)
end if
end |λ|
end script
-- Either (Right) HTML string or (Left) message string
bindEither(lrRTF, htmlEither)
end htmlFromRTFClipExcept
-- TEST --------------------------------------------------------------------------------
on run
set lrHTML to ¬
htmlFromRTFClipExcept({"doctype", "html", "body", ¬
"xml", "style", "p", "font", "head", "span"})
if isRight(lrHTML) then
|Right| of lrHTML -- HTML string
else
|Left| of lrHTML -- Error message
end if
end run
-- GENERIC FUNCTIONS ------------------------------------------------------------------
-- Left :: a -> Either a b
on |Left|(x)
{type:"Either", |Left|:x, |Right|:missing value}
end |Left|
-- Right :: b -> Either a b
on |Right|(x)
{type:"Either", |Left|:missing value, |Right|:x}
end |Right|
-- bindEither (>>=) :: Either a -> (a -> Either b) -> Either b
on bindEither(m, mf)
if isRight(m) then
mReturn(mf)'s |λ|(|Right| of m)
else
m
end if
end bindEither
-- isLeft :: Either a b -> Bool
on isLeft(x)
set dct to current application's ¬
NSDictionary's dictionaryWithDictionary:x
(dct's objectForKey:"type") as text = "Either" and ¬
(dct's objectForKey:"Right") as list = {missing value}
end isLeft
-- isRight :: Either a b -> Bool
on isRight(x)
set dct to current application's ¬
NSDictionary's dictionaryWithDictionary:x
(dct's objectForKey:"type") as text = "Either" and ¬
(dct's objectForKey:"Left") as list = {missing value}
end isRight
-- Lift 2nd class handler function into 1st class script wrapper
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
if class of f is script then
f
else
script
property |λ| : f
end script
end if
end mReturn
Where a method returns an error by indirection, you cannot rely on the presence of an error object as an indication that the method has not succeeded. You must test the method’s direct result — in this case for missing value — and then deal with the error only if the direct result indicates one has been thrown.
I also suggest FP evangalism might better conducted in a thread of its own.
allow construction with a kind of Lego brick which contains two channels:
a value channel for results which can be passed on to enclosing calls
a glitch channel which can just indicate whether everything has successfully returned a value so far, or hold a message detailing the point at which a value of the type required couldn’t be obtained.
In other words, just a slightly more composable alternative to the parallel channels of:
executing code, vs
run-time error
and the main advantage, in contexts where this happens to have any value, is to reduce the incidence of run-time errors.
I understand optionals; I use Swift a lot (I’ve somewhat grudgingly let it more or less replace Obj-C for all new projects). I still don’t understand the syntax you’re using here in AppleScript though; what’s with all the pipes and greek symbols?
Pff … I did type out the full name ‘lambda’ for anonymous functions for a while,
but after a while I found it just slightly long and noisy. I personally prefer a keyboard shortcut for |λ|.
Fair enough. I appreciate this is written in a way that is best for you, but presumably you didn’t post it here for your own sake but for that of others.
Code like this:
mReturn(mf)'s |λ|(|Right| of m)
is about as helpful as me telling someone to type an octothorpe at the beginning of a shell script. Sure, I could spend time explaining why that’s better than calling it a hash or a pound sign, but really we could all have used our time better if I’d just given instructions in the first place that would be readily understandable to the majority of people I’m intending to help.
Very happy to explain if you like – my thought was just to offer something pasteable and reusable as a function – but perhaps we should respect Shane’s feeling that his thread risks being evandalised ?
I’d be more than willing to learn how you came up with that kind of code, just because I’m interested in things that I don’t understand. Maybe you could start a new thread explaining how this Fp thing works within AS and why it might be worth adopting in certain (or any) circumstances.
Sorry to add to this thread… I suppose FP stands for Functional Programming ? So are map, filter and reduce/fold just concepts of FP or actual functions that we have access to somehow in AS ?
FP/Functional Programming was the term which Shane invoked, perhaps in jest, and which is certainly thrown around a lot, but its not a term I personally take all that seriously.
mapfilter and fold are just useful functions which you can define (or get ready-baked off the shelf) in most languages now, and which just make life a bit easier.