OCR Text Extractor

Extract text from any image — powered by Tesseract.js via CDN

100% Free — No API key, No upload, No server
Tesseract.js runs entirely in your browser using WebAssembly. Your images never leave your device. Works offline after the first load.
✓ No API key needed ✓ Fully offline after first load ✓ All browsers supported ✓ Just a <script> tag 9 languages JPG · PNG · WebP
Drop an image here or click to browse
Works with scans, screenshots, photos of text — JPG · PNG · WebP · BMP
Preview of selected image
Document Language
Output Mode
Initialising Tesseract… 0%
Loading OCR engine
Loading language data
Initialising
Recognising text
✓ Text extracted

Frequently Asked Questions

What is OCR and how does it work?

OCR (Optical Character Recognition) is technology that reads text from images. pdfGens uses Tesseract.js — a WebAssembly port of Google's Tesseract engine — running entirely in your browser to extract text from scanned documents, screenshots and photos.

Which languages does the OCR support?

pdfGens OCR supports English, Hindi, French, German, Spanish, Portuguese, Chinese (Simplified), Japanese, and Arabic. Select your language from the dropdown before extracting.

Why does OCR take time on first use?

The first time you run OCR, Tesseract.js downloads the language data file (about 5MB). This is cached in your browser, so subsequent uses are much faster.

Can I extract text from a scanned PDF?

First convert your scanned PDF to an image using our PDF to Images tool, then run OCR on the resulting image.