XML processing with XML-tools

Looking for useful code for xml processing with xml-tools.

Where can “The Utility AppleScript Code page” be found?

From the web page http://latenightsw.com/freeware/xml-tools/xml-parsing/

“The Utility AppleScript Code page gives more sample code showing how to access information from the data structure returned by the parse XML command.”

Thanks!

Jeez! zzzzzzzzzzz No One! zzzzzzzzzzz ??

Sorry, that page has long since been lost. The code, if memory serves, was really just a set of AS handlers that iterated through the lists of items produced by XML Tools looking for particular element names.

Hi George,

I have AS code that uses Mark’s XML Tools.osax to parse sample XML, which contains two records of 60-some fields each, with three levels of nesting.

I’d be happy to send it to you (or anyone else who might have use for it). Request it off-list by email. Click on my name to open my profile on the Debugger forum web site. Then click the “Expand” button to reveal my email address.

Stan C.

George, have you looked at the Satimage XML Lib?
See

To each his own, but AppleScriptObjC can be very efficient, especially if you can use XPath queries.

There’s a good example here: http://macscripter.net/viewtopic.php?id=45479

The poster was originally using System Events, which is fine for very small jobs but notoriously slow. With a bit of tweaking and reworking in ASObjC, the time to parse the XML went from 17+ minutes to <0.4 seconds.

Obviously that’s an exceptional case, but still…

Shane,

I was wondering when you would pop in with an ASObjC solution. :wink:

I would prefer to use a native solution (like ASObjC) rather than a 3rd party osax or script lib, but ease of use and good documentation play a huge role.

While I’m sure the ASObjC solution you linked to is a great solution for that very specific use case, it didn’t really help me much in learning how to use ASObjC for XML processing.

Satimage has provided great documentation and tutorials for their XMLLib.
Is there anything equivalent for ASObjC XML processing?

There isn’t any ASObjC documentation, and the Objective-C documentation is not particularly deep (search for Apple’s Introduction to Tree-Based XML Programming Guide for Cocoa).

However, XPath is a W3C language, so there’s a mass of stuff about it available on the Web. And if what you wish to do can be done with XPath, it’s generally very quick and involves minimal code.

As a taster, in ASObjC you mostly deal with three main classes: NSXMLDocument, NSXMLElement and NSXMLNode. The first two are subclasses of the latter.

XML is a pretty broad topic, but I suspect the most common requirement in scripts is XML parsing. The script in this thread Converting HTML (or xml) table to AppleScript list is good example of XPath in action.

I really have to second Shane’s suggestion to use XPath if possible. Back in the mists of time when I created XML Tools, I moved on to something called XSLTTools (now discontinued) which allowed XPath queries into an XML document. This was so much more useful for pulling information out of XML and making it usable within AppleScript. Especially true for deep or wide XML hierarchies.

I should probably mention that there’s also a streaming parser, NSXMLParser. It’s not for the feint-hearted and not exactly fast in ASObjC, but it might be of interest to some. Search forIntroduction to Event-Driven XML Programming Guide for Cocoa to read more.

Here’s a sample of using it to convert XML to an AppleScript record. This is just a translation from Objective-C code – you can see the original code and article here: http://troybrant.net/blog/2010/09/simple-xml-to-nsdictionary-converter/

-- Based on <http://troybrant.net/blog/2010/09/simple-xml-to-nsdictionary-converter/>

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

property dictStack : missing value -- stack to hold array of dictionaries
property textInProgress : "" -- string to collect text as it is found
property anError : missing value -- if we get an error, store it here

on makeRecordWithXML:xmlString
	-- set up properties
	set my dictStack to current application's NSMutableArray's array() -- empty mutable array
	dictStack's addObject:(current application's NSMutableDictionary's |dictionary|()) -- add empty mutable dictionary
	set my textInProgress to current application's NSMutableString's |string|() -- empty mutable string
	-- convert XML from string to data
	set anNSString to current application's NSString's stringWithString:xmlString
	set theData to anNSString's dataUsingEncoding:(current application's NSUTF8StringEncoding)
	-- initialize an XML parser with the data
	set theNSXMLParser to current application's NSXMLParser's alloc()'s initWithData:theData
	-- set this script to be the parser's delegate
	theNSXMLParser's setDelegate:me
	-- tell it to parse the XML
	set theResult to theNSXMLParser's parse()
	if theResult then -- went OK, get first item on stack
		return ((my dictStack)'s firstObject()) as record
	else -- error, so return error
		error (my anError's localizedDescription() as text)
	end if
end makeRecordWithXML:

-- this is an XML parser delegate method. Called when new element found
on parser:anNSXMLParser didStartElement:elementName namespaceURI:aString qualifiedName:qName attributes:aRecord
	-- store reference to last item on the stack
	set parentDict to my dictStack's lastObject()
	-- make new child
	set childDict to current application's NSMutableDictionary's |dictionary|()
	-- if there are attributes, add them as a record with key "attributes"
	if aRecord's |count|() > 0 then
		childDict's setValue:aRecord forKey:"attributes"
	end if
	-- see if there's already an item for this key
	set existingValue to parentDict's objectForKey:elementName
	if existingValue is not missing value then
		-- there is, so if it's an array, store it...
		if (existingValue's isKindOfClass:(current application's NSMutableArray)) as boolean then
			set theArray to existingValue
		else
			-- otherwise create an array and add it
			set theArray to current application's NSMutableArray's arrayWithObject:existingValue
			parentDict's setObject:theArray forKey:elementName
		end if
		-- then add the new dictionary to the array
		theArray's addObject:childDict
	else
		-- add new dictionary directly to the parent
		parentDict's setObject:childDict forKey:elementName
	end if
	-- also add the new dictionary to the end of the stack
	(my dictStack)'s addObject:childDict
end parser:didStartElement:namespaceURI:qualifiedName:attributes:

-- this is an XML parser delegate method. Called at the end of an element
on parser:anNSXMLParser didEndElement:elementName namespaceURI:aString qualifiedName:qName
	-- if any text has been stored, add it as a record with key "contents"
	if my textInProgress's |length|() > 0 then
		set dictInProgress to my dictStack's lastObject()
		dictInProgress's setObject:textInProgress forKey:"contents"
		-- reset textInProgress property for next element
		set my textInProgress to current application's NSMutableString's |string|()
	end if
	-- remove last item from the stack
	my dictStack's removeLastObject()
end parser:didEndElement:namespaceURI:qualifiedName:

-- this is an XML parser delegate method. Called when string is found. May be called repeatedly
on parser:anNSXMLParser foundCharacters:aString
	-- only append string if it's not solely made of space characters (which should be, but aren't, caught by another delegate method)
	if (aString's stringByTrimmingCharactersInSet:(current application's NSCharacterSet's whitespaceAndNewlineCharacterSet()))'s |length|() > 0 then
		(my textInProgress)'s appendString:aString
	end if
end parser:foundCharacters:

-- this is an XML parser delegate method. Called when there's an error
on parser:anNSXMLParser parseErrorOccurred:anNSError
	set my anError to anNSError
end parser:parseErrorOccurred:

set xmlString to "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<character>
    <firstName>Saga</firstName>
    <lastName>Norén</lastName>
    <city>Malmö</city>
    <partner approach=\"dogged\">
        <firstName>Martin</firstName>
        <lastName>Rohde</lastName>
        <city>København</city>
    </partner>
</character>"
its makeRecordWithXML:xmlString

--> {|character|:{firstName:{|contents|:"Saga"}, lastName:{|contents|:"Norén"}, city:{|contents|:"Malmö"}, partner:{firstName:{|contents|:"Martin"}, lastName:{|contents|:"8"}, city:{|contents|:"København"}, attributes:{approach:"dogged"}}}}

While I will jump into just anything with ASObj-C and at least try it I am very hesitant to try the ASObj-C XML stuff. ASObj-C can be cryptic at times but trying to debug code for a XML parse can be one gigantic headache. If I ever did learn it I would start off with simpler and move up to more complex. Like Jim says it is hard to generalize for a more complex example. XML parsing is very tedious to check and debug when it is complex.

I’ve thought about addressing that in the ASObj-C database but I would only do that if I was willing to start simple and work up to complex.

Bill

Can you use this XPath method when the source XML has namespace(s) involved?

Yes, you can. It can be a bit more complicated, depending what you want to do, but it can be done.

Here’s a simple example (I’ve trimmed the XML for space reasons):

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theXML to "<x:xmpmeta xmlns:x=\"adobe:ns:meta/\" x:xmptk=\"Adobe XMP Core 5.6-c137 79.159768, 2016/08/11-13:24:42        \">
   <rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">
      <rdf:Description rdf:about=\"\"
            xmlns:xmp=\"http://ns.adobe.com/xap/1.0/\"
            xmlns:xmpMM=\"http://ns.adobe.com/xap/1.0/mm/\"
            xmlns:stEvt=\"http://ns.adobe.com/xap/1.0/sType/ResourceEvent#\"
            xmlns:stRef=\"http://ns.adobe.com/xap/1.0/sType/ResourceRef#\"
            xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
            xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\">
         <xmp:CreatorTool>Adobe InDesign CC 2017 (Macintosh)</xmp:CreatorTool>
         <xmp:CreateDate>2017-03-19T12:21:51+11:00</xmp:CreateDate>
         <xmp:MetadataDate>2017-03-19T12:21:51+11:00</xmp:MetadataDate>
         <xmp:ModifyDate>2017-03-19T12:21:51+11:00</xmp:ModifyDate>
         <xmpMM:InstanceID>uuid:e826b00e-8f3d-d448-be8f-1b4a3e36b6e4</xmpMM:InstanceID>
         <dc:format>application/pdf</dc:format>
         <pdf:Producer>Adobe PDF Library 15.0</pdf:Producer>
         <pdf:Trapped>False</pdf:Trapped>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>"


set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithXMLString:theXML options:0 |error|:(reference)

-- ignoring namespace
set {theNodes, theError} to theXMLDoc's nodesForXPath:"//*[local-name()='InstanceID']" |error|:(reference)

-- or, more simply:
set {theNodes, theError} to theXMLDoc's nodesForXPath:"//*:InstanceID" |error|:(reference)

-- these are more specific but return the same thing
set {theNodes, theError} to theXMLDoc's nodesForXPath:"//*[local-name()='Description']/*[local-name()='InstanceID']" |error|:(reference)
set {theNodes, theError} to theXMLDoc's nodesForXPath:"//*[local-name()='RDF']/*[local-name()='Description']/*[local-name()='InstanceID']" |error|:(reference)
set {theNodes, theError} to theXMLDoc's nodesForXPath:"/*[local-name()='xmpmeta']/*[local-name()='RDF']/*[local-name()='Description']/*[local-name()='InstanceID']" |error|:(reference)

if theNodes = missing value then error (theError's localizedDescription() as text)
return theNodes's valueForKey:"stringValue"