RegEx or Other Technique For Remove YAML Matter From Front Text?

I am writing blog posts in Mark Text using Markdown.

When a post is completed I have Mark Text export the post as HTML to the clipboard.

I use the following AppleScript on my Desktop to inject spans into HTML listitems within some of the posts.

set C to the clipboard

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

Presently each document in Mark Text has a YAML section at the top that I use for each blog title. I don’t want this in the actual post so I must presently manually scroll just below the YAML and press Command + Shift + Down Arrow.

It would be much faster to be able to press Command + A anywhere in the document and then export the post as HTML to the clipboard. This however, leaves the unwanted YAML matter in the HTML.

I need to add code to the above script to remove the YAML matter. Each YAML has the following structure, with the text between the code tags always being different. This YAML matter is always found at the top of the document.

Can regex be used with Applescript to remove this OR is there another method which will remove this pattern consisting of random text between the always present pre and code tags?

<pre><code class="fenced-code-block language-yaml">Title Will Be Different for Each YAML
</code></pre>

There’s no built-in regex in AppleScript other than using AppleScriptObjectiveC. So you could do this:

use framework "Foundation"
use scripting additions

[...]
set c to current application's NSString's stringWithString:c
set c to (c's stringByReplacingOccurrencesOfString:"<pre><code class=\"fenced-code-block language-yaml\">[^<]+</code></pre>" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, c's |length|()}) as text

Or use my RegexAndStuffLib Script Library, like this:

use scripting additions
use script "RegexAndStuffLib" version "1.0.6"

[...]
set c to regex change c search pattern "<pre><code class=\"fenced-code-block language-yaml\">[^<]+</code></pre>" replace template ""

You could probably also do it just using text item delimiters by splitting-and-joining a few times, although that sort of code can be a bit tedious to write.

Thanks for the reply. I will tryout and research your examples shortly.

I was able to find another possible solution:

Typical Post HTML from Mark Text Looks Like This:

<pre><code class="fenced-code-block language-yaml">Title Will Be Different for Each YAML
</code></pre>
<p><strong>List 1 Title</strong></p>
<ul>
<li><p>Some text in item one.</p>
</li>
<li><p>Some text in item two.</p>
</li>
<li><p>Some text in item three.</p>
</li>
<li><p>Some text in item four.</p>
</li>
</ul>

This code works except it leaves a blank line where the YAML was.

set C to the clipboard

set B to offset of "<pre><code class=\"fenced-code-block language-yaml\">" in C
set E to offset of "</code></pre>" in C
set E to E + 12
set YAML to text B thru E of C

set the C to replaceText(YAML, "", C)

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

So I thought this might be a unprinted character after the YAML so I tried:

set E to offset of β€œβ€ & return & linefeed in C
-Does not remove all of YAML

set E to offset of β€œβ€ & return in C
-Blank line remains

set E to offset of β€œβ€ & linefeed in C
-Does not remove all of YAML

set E to offset of β€œβ€ & character id 8233 in C
-Does not remove all of YAML

set E to offset of β€œβ€ & character id 8232 in C
-Does not remove all of YAML

At this point I gave up and simply increase the the count from 12 to 13 to account for the the YAML & whatever unprintable character came after it. MY Mark Text preferences are set to β€œdefault” for end of lines, which corresponds to your system (Mac OS). You can also set it to CRLF & LF.

set C to the clipboard

set B to offset of "<pre><code class=\"fenced-code-block language-yaml\">" in C
set E to offset of "</code></pre>" in C
set E to E + 13
set YAML to text B thru E of C

set the C to replaceText(YAML, "", C)

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

Tried first suggestion before going on a break. It removes the YAML and leaves blank line.

set C to the clipboard

use framework "Foundation"
use scripting additions

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"<pre><code class=\"fenced-code-block language-yaml\">[^<]+</code></pre>" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

I added a \r to the above and that gets rid of the blank line. Strangely using a \r with the solution I came up with does work. If anyone has any ideas why please let me know.

set C to the clipboard

use framework "Foundation"
use scripting additions

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"<pre><code class=\"fenced-code-block language-yaml\">[^<]+</code></pre>\r" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

\r is the RegEx metacharacter for CR. So your replace now includes the CR, which is what creates the blank line.

I’d suggest that you use \R instead of \r so that it will match any type of new line character, including LF and CRLF.

Thanks for the reply.

set C to the clipboard

set B to offset of "<pre><code class=\"fenced-code-block language-yaml\">" in C
set E to offset of "</code></pre>\R" in C
set E to E + 13
set YAML to text B thru E of C

set the C to replaceText(YAML, "", C)

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

I get a this error when I try to use β€œ\R” on the line β€œset E to E + 13”

00%20PM

This works now with the small β€œ\r”.

Without β€œ\r” added the next line should be β€œset E to E + 12”.

With β€œ\r” added the next line should be β€œset E to E + 13”.

I forgot to increase the count from 12 to 13 after adding β€œ\r” β€” this leaves the blank line.

set C to the clipboard

set B to offset of "<pre><code class=\"fenced-code-block language-yaml\">" in C
set E to offset of "</code></pre>\r" in C
set E to E + 13
set YAML to text B thru E of C

set the C to replaceText(YAML, "", C)

set the C to replaceText("<li>", "<li><span class=\"b\">● </span>", C)
set the C to replaceText("</p>", "<span class=\"r\"><br></span></p>", C)

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the clipboard to C

I believe there may be a bug in Mark Text.

The default setting is suppose to conform to the native behavior of the OS. On a modern Mac running OS X that should be LF Linefeed \n Chr10.

I ran a portion of HTML that Mark Text is creating through this tool which shows the string’s ASCII values.

http://asciivalue.com/index.php

It clearly shows its adding:

CRLF
Carriage Return + Linefeed
\r\n
Chr13 + Chr10

Mark Text Preferences:

(New users can only add one image per reply)

18%20PM

ASCII Tool Results:

Default

(New users can only add one image per reply)

In the case of escapes not native to AppleScript (that is, other than \r, \n and \t), you have to escape the backslash character. So you need to enter \\R.

1 Like

Thank You!

Works great now!