How to Convert Photos and Scanned Documents to Audio

Physical books, printed reports, and scanned PDFs all share the same problem: they’re not directly listenable. OCR (optical character recognition) combined with text to speech bridges that gap — your iPhone can photograph a printed page and read it aloud within seconds. Here’s how to do it well.

What OCR Does

OCR is the technology that converts images of text into actual, selectable text. When you photograph a page from a book, the photo is just pixels — there’s no text data a TTS engine can read. OCR analyzes those pixels, identifies letters and words, and produces a text version.

Modern smartphone OCR is surprisingly capable. A clear photo of a cleanly typeset book page will produce near-perfect text extraction. Handwriting, unusual fonts, heavy formatting, and low-quality scans produce more errors.

The Basic Workflow: Photo to Audio

Most text to speech apps with OCR follow the same basic steps:

Open your TTS app and look for a camera or scan option (often a camera icon or “Add from photo”)
Photograph the page — flat, well-lit, with the text fully in frame
The app runs OCR on the image and extracts the text
Review for errors if accuracy is critical (optional)
Press play — the app reads the extracted text aloud

For a single page, the entire process takes under 30 seconds. For a multi-page document, you’ll photograph pages in sequence and the app queues them as a single continuous document.

Getting Good OCR Results

The quality of your OCR output depends almost entirely on the quality of your input image. A few habits make a significant difference:

Lighting

Even, diffuse light is best. Avoid:

Direct sunlight (creates harsh shadows and washes out text)
Overhead lighting that reflects off glossy pages
Dim environments that force your camera to compensate with noise

Natural light from a window (not direct sun) or a well-lit room with overhead lighting usually works well. The goal is uniform brightness across the entire page.

Page Flatness

Curved pages — common in thicker books — cause OCR errors at the edges and in the gutter (the center fold). Press the book flat when photographing, or photograph one page at a time with your hand holding it flat.

Document scanners (either hardware or apps like Microsoft Lens or Apple’s built-in document scanner in Files) produce better results than free-form photos because they correct for perspective and flatten the page digitally.

Distance and Resolution

Hold your phone at a distance where the entire page fills most of the frame without the text becoming blurry. Most modern iPhones have more than enough resolution — the issue is usually blur from camera shake or being too close, not resolution.

Text Clarity

OCR works best on:

Standard serif and sans-serif fonts (Times New Roman, Arial, etc.)
High-contrast black text on white background
Clean, unworn pages

It struggles with:

Handwriting (especially cursive)
Heavily stylized fonts
Text printed on colored or patterned backgrounds
Faded or aged documents
Text that crosses a fold or crease

Photographing a Full Book

If you want to listen to a full book chapter by chapter:

Work through the chapter page by page, photographing each one
Import in order — most TTS apps let you add multiple photos to a single document
Label the import with the chapter name for easy navigation later

A typical chapter (15–25 pages) takes about 5–10 minutes to photograph if you’re moving efficiently. Set up a consistent workflow: book flat on a desk, phone directly above it, same height each time.

For a reference book or textbook you’ll return to repeatedly, this investment pays off quickly. For a one-time read, it’s worth asking whether a digital version is available through your library or a store first.

Handling Scanned PDFs

Scanned PDFs work the same way as photos — each page is an image that needs OCR. When you import a scanned PDF into a TTS app, it should detect that the file is image-based and run OCR automatically before reading.

Signs that a PDF needs OCR:

You can’t select or copy text when you open it
Searching within the PDF finds nothing
The file size is unusually large for the number of pages (images are larger than text data)

If your TTS app doesn’t auto-detect scanned PDFs, try running the PDF through Apple’s Files app (which can add text recognition to scanned PDFs) or a dedicated OCR app like Adobe Scan before importing.

Accuracy and Error Correction

For most use cases — listening while commuting or exercising — occasional OCR errors are barely noticeable. The reader can usually infer misread words from context, and modern TTS voices handle garbled text more gracefully than older engines.

For content where precision matters — legal documents, medical information, academic work you’ll cite — review the extracted text before listening. Most TTS apps show the text they’re about to read, giving you a chance to catch and fix errors.

Use Cases That Work Well

Physical books with no digital version — photograph chapters as you read
Printed work documents — meeting notes, printed reports, physical contracts
Magazine and newspaper articles — photograph the article directly
Handouts and worksheets — class materials or printed study guides
Old documents or archives — typed documents with legible fonts extract well

Start Listening with Text to Speech

Text to Speech — AI Book Reader includes built-in OCR for photos and scanned PDFs, so you can photograph any printed page and have it read aloud on your iPhone or iPad in seconds. It’s the fastest way to turn physical text into listenable audio without any manual transcription.