Glossary

Optical character recognition (OCR)

OCR turns images of text — scanned PDFs, photos, screenshots — into machine-readable, searchable text so software can read, search and cite the words inside them.

Optical character recognition (OCR) is the process of converting an image of text — a scanned page, a photographed document, a screenshot — into actual, machine-readable characters that software can read, search, copy and quote.

To a computer, a scanned PDF is just a picture. The page might be covered in words, but they exist only as pixels: there is no underlying text to select, search, or feed to a language model. OCR closes that gap. It detects the regions of an image that contain text, recognises each character, and reconstructs the words, lines and reading order as a text layer behind the image.

Why it matters

Without OCR, a huge amount of written knowledge is effectively invisible to software — old contracts, research scanned from print journals, receipts, faxes, handwritten notes, and any PDF that was photographed rather than exported. You can see the words, but you can't search them, copy them, or ask an AI about them.

OCR makes that content first-class:

  • Search. Recognised text can be indexed, so semantic search and keyword search work across scanned material.
  • Citing. Once each word has a known position on the page, an answer can point back to the exact passage it came from — the foundation of a real citation.
  • Reuse. Text can be selected, translated, summarised, or chatted with like any digital document.

How it works

A typical pipeline cleans up the image (deskewing and de-noising), detects text regions, classifies each glyph, and assembles characters into words and lines, often with a language model to fix likely misreads. Quality depends on scan resolution, contrast and font.

Sidenote runs OCR on scanned PDFs automatically, so you can read, chat with and quote documents that started life as images — and because each recognised passage keeps its location, every answer can scroll to the exact source instead of asking you to trust it.

Related terms
All terms
Ready when you are

Stop digging. Start asking.

Add Sidenote to Chrome, open any page in your wiki, and ask it the question you’ve been Slacking the team about.

7-day Pro trial · No card required · Free tier forever