Scanned PDF to Text: How to Extract Text with OCR

By FileConvertLab

Published: | Updated:

Converting scanned PDF to text using OCR
Illustration showing OCR conversion from scanned PDF to editable text

Need to convert a scanned PDF to text? First, check if your PDF actually needs OCR:

OCR (Optical Character Recognition) reads text from images. It works with scanned documents, photos of pages, and image-based PDFs.

Which Tool Do You Need?

Your situationWhat you getUse this
PDF with selectable textWord file with original formatting (fonts, colors, layout)PDF to Word
Scanned PDF, need just the textPlain text without formattingOCR PDF to Word
Scanned PDF, need searchable PDFSame look + selectable text layerOCR to Searchable PDF
Scanned PDF, need layout preserved in WordWord file that looks like the original scanAI PDF to Word
Photo of a document (JPG, PNG)Plain text without formattingOCR Image to Word

Understanding the Difference

  • PDF to Word — works with PDFs that have selectable text. Extracts formatting from PDF metadata (fonts, sizes, colors, layout) and recreates it in Word. Best for digitally-created PDFs.
  • OCR PDF to Word — for scanned documents when you just need the text. Recognizes text from images but doesn't preserve layout. Output is plain text in your chosen format.
  • AI PDF to Word — for scanned documents when you need the layout too. Uses AI to detect fonts, text sizes, bold/italic, tables, headers, and columns. Recreates the visual layout in an editable Word document.

How to Check If Your PDF Is Scanned

Open your PDF and try to select text with your mouse:

  • You can highlight individual words → It's a text-based PDF. Standard PDF to Word will work.
  • You select a rectangle or nothing → It's a scanned PDF. You need OCR.

Another test: try Ctrl+F (or Cmd+F) to search. If search finds nothing even though you can see text — it's a scanned PDF.

How to Convert Scanned PDF to Text

  1. Go to OCR PDF to Word
  2. Upload your scanned PDF
  3. Wait for processing (a few seconds per page)
  4. Download your Word document
  5. Proofread — OCR isn't perfect, check for errors

The output is plain text in a Word document. Formatting (columns, styling) won't transfer — you get the text content only.

Tips for Better OCR Results

FactorGoodBad
Resolution300 DPI or higher72-150 DPI
ContrastBlack text on white backgroundFaded, colored, or low contrast
AlignmentStraight pagesSkewed, rotated
CleanlinessNo stains, marks, or annotationsCoffee stains, stamps, handwriting over text
FontStandard fonts (Times, Arial)Decorative, script, or unusual fonts

Key point: OCR accuracy depends almost entirely on source quality. A clean 300 DPI scan of a typed document: 95%+ accuracy. A blurry photo of a faded receipt: maybe 60%.

When OCR Won't Help

  • Handwritten text — OCR doesn't read handwriting. You'll need to type it manually.
  • Very blurry scans — If you can barely read it, neither can OCR.
  • Decorative fonts — Fancy scripts and unusual typefaces confuse OCR.
  • Complex forms with checkboxes — OCR extracts text, not form structure.

Alternative: Searchable PDF

Don't need to edit, just need to search or copy text? Use OCR to Searchable PDF.

This adds an invisible text layer to your scanned PDF. The document looks exactly the same, but now you can:

  • Search with Ctrl+F
  • Select and copy text
  • Have the PDF indexed by search engines

Related Guides

Frequently Asked Questions

How do I know if my PDF is scanned?

Try to select text with your mouse. If you can highlight words — it's a regular PDF, use standard conversion. If you can only select a rectangle or nothing at all — it's a scanned PDF that needs OCR.

What resolution works best for OCR?

300 DPI or higher. Lower resolution (150 DPI or less) often produces errors, especially with small text. If your scan looks blurry when zoomed in, OCR results will be poor.

Can OCR read handwritten text?

No. Standard OCR works with printed, typed text only. Handwriting recognition is a different technology. For handwritten documents, you'll need to type them manually or use specialized handwriting recognition services.

Why does OCR output have errors?

Usually because of poor source quality: low resolution, skewed pages, stains, unusual fonts, or faded text. Better source = better OCR. Always proofread the output.

Can I OCR a PDF with multiple languages?

Yes. Our OCR supports English, Russian, German, French, Spanish, Portuguese, Chinese, Japanese, Korean and more. Documents mixing Latin and Cyrillic work well. Documents mixing Latin and Chinese may need more proofreading.

What output formats are available?

Editable Word document (DOCX) for when you need to edit the text, or searchable PDF that looks like the original but with selectable, searchable text.

How long does OCR take?

Single page: a few seconds. 100-page document: a few minutes. Complex layouts and high resolution take longer.

Ready to Extract Text from Your Scanned PDF?

Upload your document and get editable text in seconds.

Scanned PDF to Text: Extract Text with OCR Guide