I just found out that TextEdit paragraphs all include \n at the end, and I’m putting all the paragraphs in xml tags and that extra \n should not be there.
Removing the \n when I process the paragraphs within a TextEdit tell block would most probably seriously slow down the script.
For ex. I guess I could divide the string at the “\n” and only take the first part, or I could find everything that’s not a “\n”, or I could replace “\n” by nothing, etc.
The place where I’m thinking of doing that is here:
set newSeg to (current application's NSXMLNode's elementWithName:"seg" stringValue:myTUV)
You could use text item delimiters instead of paragraphs
-- TextEdit - Get paragraphs via text item delimiters
tell application "TextEdit"
try
set theDocument to document 1
set theText to text of theDocument
set theParagraphs to my tid(theText, linefeed)
on error error_message number error_number
if the error_number is not -128 then display alert "TextEdit" message error_message as warning
return
end try
end tell
on tid(theInput, theDelimiter)
set d to AppleScript's text item delimiters
set AppleScript's text item delimiters to theDelimiter
if class of theInput = text then
set theOutput to text items of theInput
else if class of theInput = list then
set theOutput to theInput as text
end if
set AppleScript's text item delimiters to d
return theOutput
end tid
Not sure I understand. My script gets the whole text and splits it at \n. The result is the same as getting the paragraphs and removing the \n from each paragraph, I think.
Idea: count the Apple Events that are necessary to
get the text
split into paragraphs
I guess there’s no faster way than using text item delimiters.
(PS the tid handler wasn’t written for this script, it can be simplified)
suzume. Pete has answered your questions but I ran some timing tests and thought I’d report the results. The test string contained 1001 paragraphs separated by linefeeds.
I first tested text item delimiters, both to replace all linefeeds with another character and to create a list of all paragraphs. These each took about 2 milliseconds.
I next used text’s paragraph elements to create a list of every paragraph in the string; this took 4 milliseconds. Although not applicable with your script, an advantage of this approach is that paragraphs are identified by a carriage return, linefeed, or carriage return and linefeed.
Finally, I used NSString’s componentsSeparatedByCharactersInSet to create an array of every paragraph in the string; this took less than 1 millisecond.
use framework "Foundation"
use scripting additions
set theString to "line 1
line 2
line 3
line 4
"
set theString to current application's NSString's stringWithString:theString
set theDelimiters to (current application's NSCharacterSet's newlineCharacterSet())
set theArray to (theString's componentsSeparatedByCharactersInSet:theDelimiters)
set theList to theArray as list -- {"line 1", "line 2", "line 3", "line 4", ""}
Just as an aside, my normal inclination would be to remove the linefeeds after the TextEdit code but before the ASObjc XML code, but that’s for you to decide.
What I’m comparing the script speed to is: open the resulting file in BBEdit, do a search-replace and save. Which is about 5-6 seconds total, but the script has to feel much faster, because it’s a script
I’ll be trying Pete’s approach later this week. Thank you again.
My original use case is: 2 columns in Excel, one script to merge all the lines into one TMX file.
In some cases, I get the data to merge from 2 text files: one is exported from a tool I use (OmegaT) the other is a machine translation version that I paste into a text file (same number of lines).
And since I want to prefer out of the box or free software solutions whenever possible (Excel is neither but I have yet to find how to create arbitrary XML exports in LibreOffice), I decided to use TextEdit instead of, say, BBedit, which I’me very found of.
Now, your question may not be about TextEdit as an app (in which case I think I just answered), put as a process, and here, indeed, there is no requirement regarding TextEdit. I’m only using TextEdit to save the MT output to text, but the rest could be handled by a different process.
Because I’m going to open the source text to copy-paste it into the MT engine, and copy-paste the MT output into a second opened text window.
So, as far as my process is concerned, at the moment when I launch the script, I have 2 opened text windows, one is opened from disk and one is not yet saved (and does not need to me).
And when I launch the script, I have visual feedback on which file is the source file and which is the target file and I confirm the source and target languages at that time.
Pete, it worked and it was very fast. Thank you very much !
The process itself, before the conversion to XML took about 3 secs, but mostly because I was visually confirming my input through dialogs. Once the input was confirmed, the conversion to XML loops over all the paragraphs and adds the XML tags and that’s what takes the longest.