TextEdit "paragraph" to NSString without "\n"

I just found out that TextEdit paragraphs all include \n at the end, and I’m putting all the paragraphs in xml tags and that extra \n should not be there.

Removing the \n when I process the paragraphs within a TextEdit tell block would most probably seriously slow down the script.

So I want to do that when I generate the XML with ASObjC. The problem is that I’m checking the NSString documentation and I see lots of possible ways to do that.
https://developer.apple.com/documentation/foundation/nsstring

For ex. I guess I could divide the string at the “\n” and only take the first part, or I could find everything that’s not a “\n”, or I could replace “\n” by nothing, etc.

The place where I’m thinking of doing that is here:

set newSeg to (current application's NSXMLNode's elementWithName:"seg" stringValue:myTUV)

So I guess I’d have to work on that stringValue part:
https://developer.apple.com/documentation/foundation/xmlnode/1409818-stringvalue/

Ooops, I realize that I could also work on that part:

set myTUV to item j of myTU

Where I define the contents that will be handled by stringValue.

What would be the best approach to remove that pesky “\n” quickly and without burdening the system ?

All this is about code I wrote about 5 years ago and it’s almost physically painful to realize that I don’t have a clue anymore.

You could use text item delimiters instead of paragraphs

-- TextEdit - Get paragraphs via text item delimiters

tell application "TextEdit"
	try
		set theDocument to document 1
		set theText to text of theDocument
		set theParagraphs to my tid(theText, linefeed)
		
	on error error_message number error_number
		if the error_number is not -128 then display alert "TextEdit" message error_message as warning
		return
	end try
end tell

on tid(theInput, theDelimiter)
	set d to AppleScript's text item delimiters
	set AppleScript's text item delimiters to theDelimiter
	if class of theInput = text then
		set theOutput to text items of theInput
	else if class of theInput = list then
		set theOutput to theInput as text
	end if
	set AppleScript's text item delimiters to d
	return theOutput
end tid
1 Like

My worry is that the script will slow down to unpractical speeds. I have typically hundreds of paragraphs to handle at once.

I could probably handle the \n at the very end of the process, before writing to file. I’d do a replace on \n for ex.

So I’m really looking for a super fast way to do that.

That’s no problem.

Not sure I understand. My script gets the whole text and splits it at \n. The result is the same as getting the paragraphs and removing the \n from each paragraph, I think.

Idea: count the Apple Events that are necessary to

  • get the text
  • split into paragraphs

I guess there’s no faster way than using text item delimiters.

(PS the tid handler wasn’t written for this script, it can be simplified)

1 Like

suzume. Pete has answered your questions but I ran some timing tests and thought I’d report the results. The test string contained 1001 paragraphs separated by linefeeds.

I first tested text item delimiters, both to replace all linefeeds with another character and to create a list of all paragraphs. These each took about 2 milliseconds.

I next used text’s paragraph elements to create a list of every paragraph in the string; this took 4 milliseconds. Although not applicable with your script, an advantage of this approach is that paragraphs are identified by a carriage return, linefeed, or carriage return and linefeed.

Finally, I used NSString’s componentsSeparatedByCharactersInSet to create an array of every paragraph in the string; this took less than 1 millisecond.

use framework "Foundation"
use scripting additions

set theString to "line 1
line 2
line 3
line 4
"
set theString to current application's NSString's stringWithString:theString
set theDelimiters to (current application's NSCharacterSet's newlineCharacterSet())
set theArray to (theString's componentsSeparatedByCharactersInSet:theDelimiters)
set theList to theArray as list -- {"line 1", "line 2", "line 3", "line 4", ""}

Just as an aside, my normal inclination would be to remove the linefeeds after the TextEdit code but before the ASObjc XML code, but that’s for you to decide.

Thank you very much to the both of you.

What I’m comparing the script speed to is: open the resulting file in BBEdit, do a search-replace and save. Which is about 5-6 seconds total, but the script has to feel much faster, because it’s a script :slight_smile:

I’ll be trying Pete’s approach later this week. Thank you again.

Can I ask why it involves TextEdit at all?

All this is a follow up to that thread:

Which was followed by that other one:

My original use case is: 2 columns in Excel, one script to merge all the lines into one TMX file.

In some cases, I get the data to merge from 2 text files: one is exported from a tool I use (OmegaT) the other is a machine translation version that I paste into a text file (same number of lines).

And since I want to prefer out of the box or free software solutions whenever possible (Excel is neither but I have yet to find how to create arbitrary XML exports in LibreOffice), I decided to use TextEdit instead of, say, BBedit, which I’me very found of.

Now, your question may not be about TextEdit as an app (in which case I think I just answered), put as a process, and here, indeed, there is no requirement regarding TextEdit. I’m only using TextEdit to save the MT output to text, but the rest could be handled by a different process.

I guess I was asking why you need to open a text file in TextEdit. You could open it directly (or read from the clipboard).

Because I’m going to open the source text to copy-paste it into the MT engine, and copy-paste the MT output into a second opened text window.

So, as far as my process is concerned, at the moment when I launch the script, I have 2 opened text windows, one is opened from disk and one is not yet saved (and does not need to me).

And when I launch the script, I have visual feedback on which file is the source file and which is the target file and I confirm the source and target languages at that time.

Pete, it worked and it was very fast. Thank you very much !
The process itself, before the conversion to XML took about 3 secs, but mostly because I was visually confirming my input through dialogs. Once the input was confirmed, the conversion to XML loops over all the paragraphs and adds the XML tags and that’s what takes the longest.

1 Like