How to remove OCR from a PDF?

I have been searching Google for some time but cannot find an answer to my question. I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.

13.2k 25 25 gold badges 37 37 silver badges 48 48 bronze badges asked Oct 11, 2014 at 6:32 525 3 3 gold badges 9 9 silver badges 23 23 bronze badges Is there any programmatical way to do it, via python? Commented Jun 24, 2021 at 7:29

11 Answers 11

In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.

On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.

3,805 3 3 gold badges 20 20 silver badges 28 28 bronze badges answered Apr 11, 2017 at 4:11 user1125483 user1125483 221 2 2 silver badges 4 4 bronze badges

I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

Commented Apr 11, 2017 at 4:20

This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

Commented Jan 21, 2018 at 20:16

The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

Commented Sep 18, 2018 at 10:38

Try the "MS Print to PDF" driver. It ships with all recent Windows versions. Make sure to check "Print As Image" under advanced settings to remove OCR.

The quality loss in printing to PDF is negligible. It does however keep the OCR by default unless you print as image.

enter image description here

answered May 12, 2020 at 12:12 249 3 3 silver badges 5 5 bronze badges

After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).

However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.

Please keep in mind that your mileage may vary.

Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.