OCR Text Extractor

Extract text from any image — powered by Tesseract.js via CDN

Drop an image here or click to browse

Works with scans, screenshots, photos of text — JPG · PNG · WebP · BMP

Document Language

Output Mode

Initialising Tesseract… 0%

Loading OCR engine

Loading language data

Initialising

Recognising text

✓ Text extracted

Frequently Asked Questions

What is OCR and how does it work?

OCR (Optical Character Recognition) is technology that reads text from images. pdfGens uses Tesseract.js — a WebAssembly port of Google's Tesseract engine — running entirely in your browser to extract text from scanned documents, screenshots and photos.

Which languages does the OCR support?

pdfGens OCR supports English, Hindi, French, German, Spanish, Portuguese, Chinese (Simplified), Japanese, and Arabic. Select your language from the dropdown before extracting.

Why does OCR take time on first use?

The first time you run OCR, Tesseract.js downloads the language data file (about 5MB). This is cached in your browser, so subsequent uses are much faster.

Can I extract text from a scanned PDF?

First convert your scanned PDF to an image using our PDF to Images tool, then run OCR on the resulting image.