Length of selection in SD7

Continuing the discussion from Shortcuts for basic line manipulation:

With the following script, there is a problem with the length of string in one specific case:
When you drag&drop a file from Finder, to paste the file path and the path contains some diacriticals.
In this case the length is augmented by the count of diacriticals.
The weird thing is that if you type the same characters, the length is wright.

For example, with some folder named “Modèles personnalisés”, the length returned by the script will be the actual length + 2

use framework "Foundation"

tell document 2 of application id "asDB"
	set {theLocation, theLength} to (character range of selection)
	set theLocation to (theLocation - 1) -- location is zero based in objC
	set scriptText to current application's NSString's stringWithString:(source text as string)
	set {location:lineLocation, |length|:lineLength} to (scriptText's lineRangeForRange:{theLocation, theLength}) -- the entire line
	set selection to {lineLocation + 1, lineLength}
	return lineLength
end tell```

For consistency, you could retrieve the string’s precomposedStringWithCanonicalMapping() property, which will ensure that the string is normalised so that characters with diacriticals are composed into a single code unit.

use framework "Foundation"

set str to current application's NSString's stringWithString:"Modèles personnalisés"

log str's |length|() --> 23
log str's precomposedStringWithCanonicalMapping()'s |length|() --> 21

NB. The snippet above, if copied and pasted, will actually log 21 for both values, as the forum does its own job of normalising the characters in Modèles personnalisés.

I created the string for testing by first decomposing it using decomposedStringWithCanonicalMapping().

1 Like

There was a bug in versions before 7.0.4 where the location and length were returned as Cocoa values, which are based on the number of 16-bit values. In 7.0.4, the values are now returned as AppleScript counts, using AppleScript’s definition of characters. This makes more sense for traditional scripts.

At the same time a new document property was introduced, selection ASObjC range. This returns the Cocoa values (including zero-based indexing locations). You should use this property in ASObjC scripts. And if you use selection ASObjC range as record, you get a record you can pass directly in ASObjC code. (Check out the explanation in the scripting dictionary.)

So your code would become:

tell document 2 of application id "com.latenightsw.ScriptDebugger7"
	set theRange to selection ASObjC range as record
	set scriptText to current application's NSString's stringWithString:(source text as string)
	set newRange to (scriptText's lineRangeForRange:theRange) -- the entire line
	set selection ASObjC range to newRange
	return |length| of newRange
end tell

Thank you guys!
Both of your solutions are worthy of interest.

While Shane’s code is SD7 specific, I think that precomposedStringWithCanonicalMapping will be perfect for a library command…

Just be aware that it will only work where characters can all fit in 16-bit Unichars. That’s probably going to cover most common accented characters, but not things like some emoticons.

Yes, I thought that there will be cases where it could be a problem even if I haven’t thought about emoticons (never use them at work).

In fact, I thought about Hebrew and Arabic characters where vowels are diacriticals.
And also some slavic or nordic specialities.
But I think they all fit in 16 bit containers…

I’m I wrong?

I’m not sure. In truth, I’m not sure I want to know, or have to remember, which is why I’m not a fan of that approach.

Is there another one you suggest ?

I meant in terms of getting length. If you really need the number of grapheme clusters, it’s probably easiest to use AppleScript’s characters.

1 Like

Okay, thank you for the advice.