Lost in "HTML to Attributed String to HTML" conversion


(Andreas Kiel) #1

I got a strange problem:
I do convert an HTML string with color flags (i.e. FF0000) to an attributed string:

set theAttrStr to (NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value))

Looks good.
Converting it back to HTML:

set theDict to cA's NSDictionary's dictionaryWithObjects:{cA's NSHTMLTextDocumentType, elementsToSkip} forKeys:{NSDocumentTypeDocumentAttribute, NSExcludedElementsDocumentAttribute}
set {htmlData, theError} to theAttrStr's dataFromRange:{0, theAttrStr's |length|()} documentAttributes:theDict |error|:(reference)

The color changes to FB0007

I think it’s kind of color space issue?
Any ideas how to avoid are really appreciated.


(Shane Stanley) #2

You could start by seeing what the old document attributes are, by passing reference for the documentAttributes parameter when you read the document. You may need to retain something there.


(Andreas Kiel) #3

No luck at all!
This really is a nightmare for me, spent the whole day on it (and more before).
Even going away from objC doesn’t help since Cocoa HTML Writer does the conversion.
Take a simple RTF file with a yellow text.(ffff00).
Use “textutil -convert html theFilePath” and you’ll get ffff0a which is close to ffff0b - means an automatic color space conversion from sRGB interpretation to Generic RGB.
I don’t find anything in ‘document attributes’ how to prevent the system from doing it.

Edit: Creating a yellow text in the app returns a ffffab yellow which matches exactly the RGB color space conversion.


(Shane Stanley) #5

This solves the simplified example. It probably needs some test for OS version, and I have to say it smells a bit to me, but it does have the simple beauty of solving the problem:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions


-- constants, and enums used
property NSForegroundColorAttributeName : a reference to current application's NSForegroundColorAttributeName
property NSDocumentTypeDocumentAttribute : a reference to current application's NSDocumentTypeDocumentAttribute
property NSHTMLTextDocumentType : a reference to current application's NSHTMLTextDocumentType
property NSUTF8StringEncoding : a reference to 4

set sHtml to "<body><font color=\"#FF0001\">RED </font><font color=\"#FFFF00\">YELLOW </font><font color=\"#00FF00\">GREEN </font><font color=\"#0000FF\">BLUE </font><font color=\"#00FFFF\">CYAN </font><font color=\"#FF0AFF\">PURPLE </font></body>"
set theString to current application's NSString's stringWithString:sHtml
set theData to theString's dataUsingEncoding:NSUTF8StringEncoding
set theAttrStr to (current application's NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value))'s mutableCopy()
-- set theAttrStr2 to theAttrStr's |copy|()
set theLen to theAttrStr's |length|()
set theStart to 0
repeat
	set {theColor, theRange} to theAttrStr's attribute:NSForegroundColorAttributeName atIndex:theStart effectiveRange:(reference)
	set theRed to theColor's redComponent()
	set theGreen to theColor's greenComponent()
	set theBlue to theColor's blueComponent()
	set theColor to current application's NSColor's colorWithCalibratedRed:theRed green:theGreen blue:theBlue alpha:1.0
	theAttrStr's addAttributes:{NSColor:theColor, NSStrokeColor:theColor} range:theRange
	set theStart to current application's NSMaxRange(theRange)
	if theStart ≥ theLen then exit repeat
end repeat

set theDict to current application's NSDictionary's dictionaryWithObjects:{NSHTMLTextDocumentType} forKeys:{NSDocumentTypeDocumentAttribute}
set {htmlData, theError} to theAttrStr's dataFromRange:{0, theAttrStr's |length|()} documentAttributes:theDict |error|:(reference)
return (current application's NSString's alloc()'s initWithData:htmlData encoding:NSUTF8StringEncoding) as text

(Nigel Garvey) #6

:sunglasses:

colorWithCalibratedRed:Green:Blue:Alpha and colorWithDisplayP3Red:green:blue:alpha: both do the trick in Mojave, but are the only colorWith… methods which don’t work in El Capitan! (…DisplayP3… didn’t exist then and …Calibrated… gives the wrong results on that system.) However, the workaround’s not needed there anyway.

Maybe:

set theAttrStr to current application's NSMutableAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value)
-- set theAttrStr2 to theAttrStr's |copy|() -- For comparison.

?


(Shane Stanley) #7

Yes, that got left in by mistake – thanks. I’ll comment it out.

Here’s a version that’s more useful in that it will change colors that might be used for other attributes:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

-- constants, and enums used
property NSForegroundColorAttributeName : a reference to current application's NSForegroundColorAttributeName
property NSDocumentTypeDocumentAttribute : a reference to current application's NSDocumentTypeDocumentAttribute
property NSHTMLTextDocumentType : a reference to current application's NSHTMLTextDocumentType
property NSUTF8StringEncoding : a reference to 4

set sHtml to "<body><font color=\"#FF0001\">RED </font><font color=\"#FFFF00\">YELLOW </font><font color=\"#00FF00\">GREEN </font><font color=\"#0000FF\">BLUE </font><font color=\"#00FFFF\">CYAN </font><font color=\"#FF0AFF\">PURPLE </font></body>"
set theString to current application's NSString's stringWithString:sHtml
set theData to theString's dataUsingEncoding:NSUTF8StringEncoding
set theAttrStr to (current application's NSAttributedString's alloc()'s initWithHTML:theData documentAttributes:(missing value))'s mutableCopy()
set theLen to theAttrStr's |length|()
set theStart to 0
set colorTable to current application's NSMutableDictionary's dictionary()
repeat
	set {theAtts, theRange} to theAttrStr's attributesAtIndex:theStart effectiveRange:(reference)
	set doneKeys to current application's NSMutableArray's array()
	set allKeys to theAtts's allKeys()
	repeat with aKey in allKeys
		if not (doneKeys's containsObject:aKey) then
			set aValue to (theAtts's objectForKey:aKey)
			if (aValue's isKindOfClass:(current application's NSColor)) then
				set theColor to (colorTable's objectForKey:aValue)
				if theColor is missing value then
					set theRed to aValue's redComponent()
					set theGreen to aValue's greenComponent()
					set theBlue to aValue's blueComponent()
					set theColor to (current application's NSColor's colorWithCalibratedRed:theRed green:theGreen blue:theBlue alpha:1.0)
					(colorTable's setObject:theColor forKey:aValue)
				end if
				set relevantKeys to (theAtts's allKeysForObject:aValue)
				repeat with oneKey in relevantKeys
					(theAttrStr's addAttribute:oneKey value:theColor range:theRange)
				end repeat
				(doneKeys's addObjectsFromArray:relevantKeys)
			end if
		end if
	end repeat
	set theStart to current application's NSMaxRange(theRange)
	if theStart ≥ theLen then exit repeat
end repeat

set theDict to current application's NSDictionary's dictionaryWithObjects:{NSHTMLTextDocumentType} forKeys:{NSDocumentTypeDocumentAttribute}
set {htmlData, theError} to theAttrStr's dataFromRange:{0, theAttrStr's |length|()} documentAttributes:theDict |error|:(reference)
return (current application's NSString's alloc()'s initWithData:htmlData encoding:NSUTF8StringEncoding) as text

Align Text: A follow up of resolved problem with new question
(Andreas Kiel) #8

Soooo many thanks!!!

I had the idea about the color value replacement as well when I built an intermediate app, but I didn’t succeed.
The code works fine on (High) Sierra and Mojave I haven’t tested it on Yosemite, El Capitan.

Now only Apple needs to implement this code into their flagship app :wink:
fcp%20colors

The above is the result of a 12 time import/export of XML based on their automatic Generic RGB/sRGB handling.
I hope they solve it soon otherwise your work and mine will be useless.

Edit:
A new question arrises when I put the code into another context.
My main app uses a lot of “html-like” timed text snippets. These are converted and stored in an array which is connected to a table view. All works (slow but) fine - a least better than with other apps.
When calling an arrays record (named zRow) and select the styled text part
set theAttrStr to zRow's styledText
the above repeat loop will fail.
In case I use an additional step and put it into a text view via bindings and use
set theAttrStr to textView's textStorage()
all works fine.
In Xcode the result of
log theAttrStr
looks identical.

What do I miss?


(Shane Stanley) #9

Are you sure it’s stored as mutable?


(Andreas Kiel) #10

Solved after a break, a reboot, a re-launch of Xcode and a clean ‘automagically’.
Thanks