XML preserve space

Hello all,

I’ve to process XML files like:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE fcpxml>
<fcpxml version="1.8">
	<caption name=" &gt;&gt; Peter: Welcome to our new episode " lane="1" offset="71178107/2000s" duration="6600/2400s" start="3600s" role="ITT?captionFormat=ITT.en-US">
	    <text placement="bottom">
	        <text-style ref="ts7"> </text-style>
	        <text-style ref="ts8">&gt;&gt; Peter: Welcome to our </text-style>
	        <text-style ref="ts9">
</text-style>
	        <text-style ref="ts8">new episode</text-style>
	        <text-style ref="ts7"> </text-style>
	    </text>
	    <text-style-def id="ts7">
	        <text-style font=".AppleSystemUIFont" fontSize="13" fontFace="Regular" fontColor="1 1 1 1"/>
	    </text-style-def>
	    <text-style-def id="ts8">
	        <text-style font=".AppleSystemUIFont" fontSize="13" fontFace="Regular" fontColor="1 1 1 1" backgroundColor="0 0 0 1"/>
	    </text-style-def>
	    <text-style-def id="ts9">
	        <text-style font=".AppleSystemUIFont" fontSize="13" fontFace="Regular" backgroundColor="0 0 0 1"/>
	    </text-style-def>
	</caption>
</fcpxml>

I want to keep the “space” like in:

<text-style ref="ts7"> </text-style>

And the “line feed” as in:

	        <text-style ref="ts9">
</text-style>

How do I read/write those XMLs i.e. what option to use.

Any help is really appreciated.

Presumably NSXMLNodePreserveWhitespace.

Thanks,
You made my day :slight_smile:

Again thanks Shane,

While this works I can’t figure out to apply ‘multiple options’
Means I want to use both

NSXMLNodePrettyPrint

for a formatted output but also

NSXMLNodePreserveWhitespace

to keep spaces and line breaks.

I suspect it’s one or the other: you can’t preserve space, and at the same time add and remove space to make it look pretty. Maybe NSXMLNodePreserveEmptyElements is what you need.

I’ve tried yesterday - doesn’t work.

The idea to use 2 of the ‘options’ came from:

> NSXMLNodePreserveAll = (             NSXMLNodePreserveNamespaceOrder |              NSXMLNodePreserveAttributeOrder |              NSXMLNodePreserveEntities |              NSXMLNodePreservePrefixes |              NSXMLNodePreserveCDATA |              NSXMLNodePreserveEmptyElements |              NSXMLNodePreserveQuotes |              NSXMLNodePreserveWhitespace |             NSXMLNodePreserveDTD |             NSXMLNodePreserveCharacterReferences |             0xFFF00000)
> Discussion
> Turns on all preservation options: attribute and namespace order, entities, prefixes, CDATA, whitespace, quotes, and empty elements. You should try to turn on preservation options selectively because turning on all preservation options significantly affects performance.

The syntax is simple:

(current application's NSXMLNodePrettyPrint) + (get current application's NSXMLNodePreserveWhitespace)

I just don’t know whether it will work.

Doesn’t create an error, but doesn’t solve the problem. PrettyPrint overwrites the PreserveWhiteSpace.

Is there a way to get the text/string value like space or line break for a node.
Currently stringValue() returns “” in both cases even with PreserveWhiteSpace.

Thanks

I don’t see any simple solution. You may have to get the text sub-elements as XML strings and parse those yourself.

I spent the Sunday to find a solution, but didn’t find a satisfying one.

I can create some placeholders for linefeed and space

set checkList to XPathNodes(theXML, "//text/text-style")
repeat with check in checkList
	set zVal to (check's stringValue()) as text
	if zVal = "" then
		set test to check's XMLString() as text
		if test contains "> <" then -- space
			(check's setStringValue:"|br|")
		else -- line feed
			(check's setStringValue:"|lf|")
		end if
	end if
end repeat

################################

on XPathNodes(startNode, theArg)
	set theArg to theArg as text
	set {theNodes, theError} to startNode's nodesForXPath:(theArg) |error|:(reference)
	if theNodes = missing value then
		set theNodes to {}
	end if
	return theNodes
end XPathNodes

Then use PrettyPrint and proceed with find & replace the dummies in the XMLString.
But there can be easily a few thousand ‘spaces & linefeed’.

I can’t imagine that there is no other solution, especially cause the XMLs are generated by an Apple software.

Apple are probably using NSXMLParser, which is more powerful and flexible (and less memory-hungry) than NSXMLDocument and friends, but also more complex (and slower in ASObjC).

I’m not sure how you would need to set it up for your purposes, but this snippet might get you started:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
property mutText : missing value
property anError : missing value
property isTextStyle : false

set theString to "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<!DOCTYPE fcpxml>
<fcpxml version=\"1.8\">
	<caption name=\" &gt;&gt; Peter: Welcome to our new episode \" lane=\"1\" offset=\"71178107/2000s\" duration=\"6600/2400s\" start=\"3600s\" role=\"ITT?captionFormat=ITT.en-US\">
	    <text placement=\"bottom\">
	        <text-style ref=\"ts7\"> </text-style>
	        <text-style ref=\"ts8\">&gt;&gt; Peter: Welcome to our </text-style>
	        <text-style ref=\"ts9\">
</text-style>
	        <text-style ref=\"ts8\">new episode</text-style>
	        <text-style ref=\"ts7\"> </text-style>
	    </text>
	    <text-style-def id=\"ts7\">
	        <text-style font=\".AppleSystemUIFont\" fontSize=\"13\" fontFace=\"Regular\" fontColor=\"1 1 1 1\"/>
	    </text-style-def>
	    <text-style-def id=\"ts8\">
	        <text-style font=\".AppleSystemUIFont\" fontSize=\"13\" fontFace=\"Regular\" fontColor=\"1 1 1 1\" backgroundColor=\"0 0 0 1\"/>
	    </text-style-def>
	    <text-style-def id=\"ts9\">
	        <text-style font=\".AppleSystemUIFont\" fontSize=\"13\" fontFace=\"Regular\" backgroundColor=\"0 0 0 1\"/>
	    </text-style-def>
	</caption>
</fcpxml>"

set my mutText to current application's NSMutableString's |string|()
set theString to current application's NSString's stringWithString:theString
set theData to theString's dataUsingEncoding:(current application's NSUTF8StringEncoding)
set theParser to current application's NSXMLParser's alloc()'s initWithData:theData
theParser's setDelegate:me
set theResult to theParser's parse()
if not theResult then error (anError's |description|() as text)
return mutText

on parser:theParser foundCharacters:aString
	if my isTextStyle then mutText's appendString:aString
end parser:foundCharacters:

on parser:anNSXMLParser didStartElement:elementName namespaceURI:aString qualifiedName:qName attributes:aRecord
	if elementName as text = "text-style" then set my isTextStyle to true
end parser:didStartElement:namespaceURI:qualifiedName:attributes:

on parser:anNSXMLParser didEndElement:elementName namespaceURI:aString qualifiedName:qName
	if elementName as text = "text-style" then set my isTextStyle to false
end parser:didEndElement:namespaceURI:qualifiedName:

on parser:anNSXMLParser parseErrorOccurred:anNSError
	set my anError to anNSError
end parser:parseErrorOccurred:

You might also have to implement parser:foundIgnorableWhitespace: — I’m not sure.

Again many thanks Shane,

I’ve found that about the NSXMLParser from an iPhone developer on the weekend, but couldn’t make it work.
You script works but to honest I don’t really understand how.

The problem was that I not really need the text but an attributed string.
Additionally the must be duplicated, modified and inserted at a new parent.
I finally modified the app reading the XML as text, change the bad empty text entries into something, use the standard formating (somehow required by the users), convert to text and replace the old text entries.

P.S.
Sorry for the late reply. First I didn’t notice your reply since there was no note.
Then I forgot to hit the reply button.