PDFPen Pro selected page to text

How to coerce the selected page from pdf pen to text that can be saved?

Some experimentation below.

Thanks

tell application id "com.smileonmymac.PDFpenPro"
	activate
	set thePage to selected page of document 1
	-->page 34 of document "AppleScript Language Guide.pdf"
	--can not coerce to text
	--set the clipboard to thePage
	
	set thePage2 to thePage as record
	--> {«class form»:index, «class want»:page, selection duration:34, from:document "AppleScript Language Guide.pdf"}
	
	set thePageInteger to thePage2's integers
	-->{34}
	
	set xxx1 to «class form» of thePage2
	--> index
	set xxx2 to «class want» of thePage2
	--> page
	--set xxx3 to selection duration of thePage2
	--set xxx4 to from of thePage2
	
	set the clipboard to thePage2
end tell

Hey George, did you ever figure out how to do this?

No, other than the hack.
Smile has never responded to multiple emails.

This works for me in PDFpen Pro 11.1.1

tell application "PDFpenPro"
	tell its window 1
		--set x to plain text of selected page of its document
		set x to text content of selected page of its document
	end tell
end tell

@Hans The request was for the “page number” not the text content.
Thanks anyway

Oh, I missed that.

tell application "PDFpenPro"
	tell its window 1
		set x to its name
	end tell
end tell

set text item delimiters to space
set pagenumber to text item -3 of x

Thanks for sharing, Hans.

Your script works nicely, but it does not seem to return rich text as the scripting dictionary says:

image

I ran this slightly modified script,
running PDFpenPro 11.2 (1120.2) on macOS 10.14.6 (Mojave)

tell application "PDFpenPro"
  tell its window 1
    set pagePlainTextStr to plain text of selected page of its document
    set pageRichTextStr to text content of selected page of its document
    set the clipboard to (text content of selected page of its document) as record
    set winName to its name
    
  end tell
end tell

set text item delimiters to space
set pagenumber to (text item -3 of winName) as number

Neither the SD7 variable pageRichTextStr, nor MS Word 365, showed the rich text that is on the first page of this PDF:

Trump’s loathing for Ukraine is at the heart of His Behavior Towards Ukraine.pdf.zip (113.2 KB)

which looks like this:

Clearly there is rich text on the page (highlights are image annotations).

The SD7 variable pageRichTextStr looks like this:

I was expecting to see the rich text codes – but none are there.
When I paste into MS Word 365 it is just plain text.

I’m not saying this is your fault at all.
Just looking for a solution.

Anyone?

I don’t have PDFpen, but I get suspicious when I see mention of rich text. It’s a class that was defined in the (IMO) ill-conceived Cocoa Text class.

You can still see it in Mail, where the message class has a content property that returns rich text. What it generally means is that if you ask for the property, you get back text, but you can also ask for color of content, size of content or font of content, and you get the relevant values returned (sort of). You can also get elements like attribute runs and attachments.

What do you see when you click on rich text in the dictionary where you pasted above?

If you need to get styled text for pasting into TextEdit or Word now then consider to use the free program Skim. You could ask the makers of PDFpenPro to add this feature to their software. PDFpenPro uses the same limited ‘rich text’ that you will find in TextEdit.

tell application "Skim"
	tell its window 1
		set pagenumber to index of (current page of its document)
		set styledtext to RTF of (current page of its document)
	end tell
end tell
set the clipboard to styledtext

great - Hack #2
Thanks!

You don’t need any app to get the contents of a PDF page with styling on to the clipboard.

This code queries PDFPenPro for the document path and page number you’re viewing, then does the rest in ASObjC:

use AppleScript version "2.5" -- macOS 10.11 or later
use framework "Foundation"
use framework "Quartz" -- required for PDF stuff
use scripting additions

tell application "PDFpenPro"
	tell document 1
		set thePath to path
		set thePage to selected page
		-- get zero-based page number
		repeat with i from 1 to count of pages
			if page i is thePage then
				set pageNum to i - 1
				exit repeat
			end if
		end repeat
	end tell
end tell

set inNSURL to current application's |NSURL|'s fileURLWithPath:thePath
set theDoc to current application's PDFDocument's alloc()'s initWithURL:inNSURL
set thePDFPage to (theDoc's pageAtIndex:pageNum) -- zero-based indexes
set attribText to thePDFPage's attributedString()
set pb to current application's NSPasteboard's generalPasteboard()
pb's clearContents()
pb's writeObjects:{attribText}

Thanks Shane.

That works very fast, and does a decent job of retaining the same styles as in the PDF.

After some more testing, looks like the best quality is obtained by using PDFPenPro’s File > Export > Word 2007 docx file.

Thanks.
For the record this will get the page number of the selected page… i.e. scroll to page click it highlights the frame, or you have
selected text. If you have clicked a page and scrolled away from that page it will still read the WRONG page. Seems selected and viewing is 2 different things.

set ThePageNumber to «class seld» of (get selected page of document 1 as record)

If you just want to get the current page that is showing in the window the @Hans window hack works good.

1 Like