Move a line in a XML document

asobjc

(Jonas Whale) #1

Bonjour!

I have a XML file from witch I would like to move each image reference from the 3rd line of the node to the 1st.
More explicitly, here is an example of the initial structure:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<event_aero xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<aero>
		<eventname>Aerospace Meetings Brussels</eventname>
		<date>19 - 21 septembre 2018</date>
		<img href="file://Links/aerospace_brussels.ai"/>
		<place>Bruxelles, Belgique</place>
		<baseline>Convention d'affaires internationale de l'industrie aéronautique</baseline>
		<website>www.brussels.bciaerospace.com</website>
	</aero>
</event_aero>

And here is the result I’m expecting:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<event_aero xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<aero>
		<img href="file://Links/aerospace_brussels.ai"/>
		<eventname>Aerospace Meetings Brussels</eventname>
		<date>19 - 21 septembre 2018</date>
		<place>Bruxelles, Belgique</place>
		<baseline>Convention d'affaires internationale de l'industrie aéronautique</baseline>
		<website>www.brussels.bciaerospace.com</website>
	</aero>
</event_aero>

Is it possible to do this with NSXMLDocument?


(Shane Stanley) #2

Sure, like this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

-- classes, constants, and enums used
property NSXMLNodePreserveAll : a reference to 4.29391875E+9
property NSXMLDocument : a reference to current application's NSXMLDocument

set theXML to "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<event_aero xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">
	<aero>
		<eventname>Aerospace Meetings Brussels</eventname>
		<date>19 - 21 septembre 2018</date>
		<img href=\"file://Links/aerospace_brussels.ai\"/>
		<place>Bruxelles, Belgique</place>
		<baseline>Convention d'affaires internationale de l'industrie aéronautique</baseline>
		<website>www.brussels.bciaerospace.com</website>
	</aero>
</event_aero>"
set theDoc to NSXMLDocument's alloc()'s initWithXMLString:theXML options:NSXMLNodePreserveAll |error|:(missing value)
set theNodes to theDoc's nodesForXPath:"//aero" |error|:(missing value)
repeat with aNode in theNodes
	set imageNode to (aNode's elementsForName:"img")'s firstObject()
	if imageNode is not missing value then
		imageNode's detach()
		(aNode's insertChild:imageNode atIndex:0)
	end if
end repeat
-- now save theDoc

#3

… and NSXMLDocument also lets you define the transform as an XQuery expression:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

on run
    
    set strXQuery to "
    for $x in //event_aero return
    <event_aero xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">
        <aero>
            {$x/aero/img}
            {for $f in $x/aero/*
                let $e := name($f)
                return if (\"img\" = $e) then () else $f}
        </aero>
    </event_aero>"
    
    set strXML to "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<event_aero xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">
    <aero>
        <eventname>Aerospace Meetings Brussels</eventname>
        <date>19 - 21 septembre 2018</date>
        <img href=\"file://Links/aerospace_brussels.ai\"/>
        <place>Bruxelles, Belgique</place>
        <baseline>Convention d'affaires internationale de l'industrie aéronautique</baseline>
        <website>www.brussels.bciaerospace.com</website>
    </aero>
</event_aero>"
    
    return updatedByXQuery(strXQuery, strXML)
    
end run


-- updatedByXQuery :: String -> String -> String
on updatedByXQuery(strXQuery, strXML)
    set ca to current application
    ((item 1 of ((ca's NSXMLDocument's alloc()'s ¬
        initWithXMLString:strXML options:0 |error|:(missing value))'s ¬
        objectsForXQuery:(strXQuery) |error|:(missing value)))'s ¬
        XMLStringWithOptions:(ca's NSXMLDocumentTidyXML)) as text
end updatedByXQuery

(Shane Stanley) #4

You could probably also do it via XSLT.


(Jonas Whale) #5

Thanks guys, that sounds perfect!

Now, I need to change the image path for each node. It comes in this form:
<img href="V:\source\events\aerospace\images\Toulouse51.png"/>
I need it in this form:
<img href="file://images/Toulouse51.png"/>

I’m already able to make thist transformation using text tools on NSString but I wonder if it’s possible with NSXMLDocument?

I know that, but I’m new to XML for Indesign automation since a few days.
I haven’t looked at XSLT yet and did not dare ask the question here because we are on an Applescript forum.


#6

The protocol is that you refer to it as “ASXSLT”, and speak of it as a distinct language :wink:


(Shane Stanley) #7

You can still use NSXMLDocument methods to apply a transform, like -objectByApplyingXSLTString:arguments:error:.


#8

XQuery 1.0 is happy to run:

ffor $x in //event_aero
return
    <event_aero
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <aero>
            {
                element img {
                    attribute href {
                        concat(
                        "file://images/",
                        tokenize(
                        $x/aero/img/@href,
                        "images\\"
                        )[last()]
                        
                        )
                    }
                }
            }
            {
                for $fld in $x/aero/*
                return
                    if ("img" != name($fld)) then
                        $fld
                    else
                        ()
            }
        </aero>

but I notice that NSXMLDocument's alloc()'s initWithXMLString() chokes on:

<img href="V:\source\events\aerospace\images\Toulouse51.png"/>

Responding with:

(NSError) Error Domain=NSXMLParserErrorDomain Code=99 
"Line 2: xmlns:xsi: 
'V:\source\events\aerospace\images\Toulouse51.png' is not a valid URI

I can’t immediately see anything in the NSXMLNodeOptions which might stand down that level of rigour, so you may need to look for some upstream solution, or just do it all with AS / NSString functions.


(Jonas Whale) #9

I think it’s because the image is not the content of the node but an attribute:

imageNode's attributes()'s firstObject()'s stringValue()

I fact, to transform the file reference, no need of XSLT nor XQuery, we can achieve this with a simple regex query:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
use script "Text Factory"

-- classes, constants, and enums used
property NSXMLNodePreserveAll : a reference to 4.29391875E+9
property NSXMLDocument : a reference to current application's NSXMLDocument

set theXML to "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<event_aero xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">
	<aero>
		<eventname>Aerospace Meetings Brussels</eventname>
		<date>19 - 21 septembre 2018</date>
		<img href=\"V:\\source\\events\\aerospace\\images\\Toulouse51.png\"/>
		<place>Bruxelles, Belgique</place>
		<baseline>Convention d'affaires internationale de l'industrie aéronautique</baseline>
		<website>www.brussels.bciaerospace.com</website>
	</aero>
</event_aero>"
set theDoc to NSXMLDocument's alloc()'s initWithXMLString:theXML options:NSXMLNodePreserveAll |error|:(missing value)
set theNodes to theDoc's nodesForXPath:"//aero" |error|:(missing value)
repeat with aNode in theNodes
	set imageNode to (aNode's elementsForName:"img")'s firstObject()
	if imageNode is not missing value then
		set theString to (imageNode's attributes()'s firstObject()'s stringValue())
		
		set theRegEx to (current application's NSRegularExpression's regularExpressionWithPattern:"\\\\" options:0 |error|:(missing value))
		set theString to (theRegEx's stringByReplacingMatchesInString:theString options:0 range:{location:0, |length|:theString's |length|()} withTemplate:"/")
		set theString to theString's lastPathComponent()
		
		set thePath to "file://Links/" & theString
		(imageNode's attributes()'s firstObject()'s setStringValue:thePath)
		imageNode's detach()
		(aNode's insertChild:imageNode atIndex:0)
	end if
end repeat
theDoc

:wink: