Improve This Applescript to Parse HTML List Into MarkDown - Removing Selected List Items?

I’d like to know if my crude working Applescript can be improved by using less code.

I have a series of documents containing HTML lists. Each list has an alternating pattern of list items with contain fenced code blocks and without fenced code blocks.

1.The UL tags need to be removed.

  1. All list items containing a fenced code block must be removed.

  2. Any list item that contains a single “T” placeholder must be removed as it contains no information.

  3. Any of the remaining list items:

A. Must have their beginning LI tag replaced with the MarkDown "- " (hyphen and space).

B. Must have their closing tags removed.

C. Must have <a> tags converted to MarkDown format.

<a href="https://www.findit.com/memorial/40262219/wert">Clickable Text</a>

TO

[Clickable Text](https://www.findit.com/memorial/40262219/wert)

  1. When these HTML documents were created some HTML Character Named Codes were inserted which so far has just been &amp;. These need to replaced with the actual chracters as these codes cause problmes with my WYSIWYG MarkDown editor. Not sure if there will other Named Codes so I add two other common ones to my Applescript.
  • 18 http://somecompanylink.com/local/places.htm
        
  • Link Text Displayed | Compiled by John Brown | 1953 | Big Book Title | Accessed 10-6-2017

  • 19https://www.findit.com/mem/40262219/wert
        
  • Clickable Text | Find It | Accessed 10-6-2017

  • 20
        
  • T

set C to the clipboard

use framework "Foundation"
use scripting additions

--remove all fenced code blocks
--remove all list items which have default text of T

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"<li><pre><code class=\"fenced-code-block\">[^<]+\\R</code></pre>\\R</li>\\R" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"<li><p>T</p>\\R</li>\\R" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"<ul>\\R" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"</ul>\\R" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"<li><p>" withString:"- " options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

set C to current application's NSString's stringWithString:C
set C to (C's stringByReplacingOccurrencesOfString:"</p>\\R</li>\\R" withString:"
" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()}) as text

--temporary set delimiters to <a to determine number of text divisions there are
--number of deliters will be one less than the number of text divisions
--count count minus one for loop to parse <a></a> spans in text

set TempTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to "<a"
set TIDnum to count (text items of C)
set AppleScript's text item delimiters to TempTID

repeat TIDnum times
	
	set B to offset of "<a" in C
	set E to offset of "</a>" in C
	set E to E + 3
	set Link to text B thru E of C
	
	set the Link2 to replaceText("</a>", "", Link)
	set the Link2 to replaceText("<a href=\"", "", Link2)
	
	set AppleScript's text item delimiters to "\">"
	set F to text item 1 of Link2
	set G to text item 2 of Link2
	
	set F to "(" & F & ")"
	
	set G to "[" & G & "]"
	
	set AppleScript's text item delimiters to ""
	
	set the C to replaceText(Link, G & F, C)
	
end repeat

on replaceText(find, replace, textString)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to find
	set textString to text items of textString
	set AppleScript's text item delimiters to replace
	set textString to "" & textString
	set AppleScript's text item delimiters to prevTIDs
	return textString
end replaceText

set the C to replaceText("&amp;", "&", C)
set the C to replaceText("&lt;", "<", C)
set the C to replaceText("&gt;", ">", C)

set the clipboard to C

When you first posted this topic, you made reference to:

I am currently stuck on parsing some sections which use the wrong MarkDown for links.

These documents wrongly place the () brackets before the [] brackets.

Have you resolved this question?

This line:

set C to current application's NSString's stringWithString:C

needs to occur only once. Then in the subsequent lines you can remove the trailing as text — except for the last. You’re converting to-and-from AS strings each time needlessly. You just need to remember to use as text once, before you do any non-ASObjC manipulation.

Try this:

set C to (C's stringByReplacingOccurrencesOfString:"<a href=\"([^\"]*)\">([^<]*)</a>" withString:"[$2]($1)" options:(current application's NSRegularExpressionSearch) range:{0, C's |length|()})