Can I really make an audiobook with text to speech?

Yes — modern neural TTS voices are good enough for personal audiobooks, podcast feeds, language-learning material, and many indie distribution channels. The workflow is paste cleaned text, render chapter by chapter, stitch in an audio editor with chapter pauses, and export as MP3 or M4B. Quality depends mostly on text cleanup and voice choice.

What's the best free TTS tool for making audiobooks?

Read Aloud Reader handles the core workflow — neural voices, speed control, MP3 download — at no cost. For longer batches, browser built-in readers (Edge, Chrome) and operating-system speech are fully free and produce clean audio. Free tiers from larger providers like Amazon Polly and Google Cloud TTS also work for small projects.

Can I sell a TTS-narrated audiobook on Audible?

Generally no — Audible's ACX program currently requires human narration. Other platforms are more flexible: Findaway Voices, Google Play Books, Spotify Open Audiobooks, and direct sales from your own site allow TTS-narrated audiobooks, often with an AI-narration disclosure. Always check the latest platform terms before publishing.

How long does it take to make an audiobook with TTS?

For a 60,000-word novel, plan on roughly one full working day: about two hours for text cleanup and chapter splitting, three to four hours of rendering, two hours of stitching and pauses, and one hour of QA listening. Compare to 30+ hours for human-recorded narration.

What voice and speed settings sound most like a real audiobook?

Pick a warm mid-range neural voice and run it at 0.95x to 1.05x speed. Add 1.5 seconds of silence between paragraphs and three seconds at chapter breaks. Most amateur TTS audiobooks sound robotic because defaults are too fast and the audio runs paragraphs together with no breathing room.

How to Create Your Own Audiobook with Text to Speech

If you want to create audiobook text to speech projects from your own writing — a novel, a memoir, a stack of long blog posts — the workflow has never been simpler. People who learn how to create audiobook text to speech output with a free reader produce listenable results in an afternoon, no studio required.

Making your own audiobook used to mean a microphone, a quiet room, twenty hours of recording, and the willingness to listen to your own voice say "chapter one" forty-three times. That bar is gone. With a decent text to speech tool you can turn a manuscript, a novel, or a chunky blog archive into a listenable audiobook in an afternoon — for the cost of an electricity bill.

This guide is the practical version. Not the dream of building the next Audible, but the actual workflow: which tools to use, how to chunk a long manuscript, how to handle chapter breaks and dialogue, what voice settings make the result sound like a book and not a robot reading a phonebook.

What you actually need before you start

Three things, none of them expensive:

The text of the book in a clean digital format — a .docx, plain .txt, or pasteable PDF works. Scanned PDFs need OCR first.
A TTS tool with neural voices, speed control, and audio download. Free options exist; paid ones get you longer batch processing and more voices.
Audacity or any free audio editor for stitching chapters and adding silences.

That is it. No microphone, no soundproofing, no ACX paperwork unless you actually plan to sell on Audible (which has its own rules — more on that below).

Pick the right voice for your book

This is where most DIY audiobooks go wrong. People pick the first voice they hear and stick with it for 80,000 words. Three sessions in, the listener wants to throw their phone in a lake.

Match voice to genre. A few starting points that hold up across different TTS engines:

Literary fiction or memoir: a warm mid-range neutral voice. On Read Aloud Reader, Echo and Onyx work well — they slow down naturally on longer sentences. On Amazon Polly's Generative engine, Ruth and Stephen do the same job.
Thriller or pacey commercial fiction: a slightly brighter voice with crisp articulation. Nova or Shimmer do this on OpenAI's TTS.
Non-fiction and how-to books: a clear, lecture-style voice — Alloy or Echo at 1.05x feels professional without sounding corporate.
Children's books: a brighter, more expressive voice. Some authors record one parent voice and use TTS for the secondary character voices.

Test the first 500 words with three different voices before you commit. Listen back at the speed you intend to publish at. The voice that sounds fine in a 30-second sample is sometimes exhausting at chapter length. For more on the current generation of voices, see our best AI voices roundup.

The make-audiobook-free workflow that scales to a novel

The phrase make audiobook free gets searched a lot, and the honest version is: yes, fully free, with a few caveats around batch size and download limits. The bigger choice is workflow shape — how you chunk a manuscript so the engine doesn't choke and the final book to audio output sounds clean. To create audiobook text to speech results that don't sound like a wall of speech, the structure of the input matters more than the voice you pick.

How to chunk a manuscript so it sounds right

TTS engines have character limits. OpenAI's TTS API caps at 4,096 characters per request. ElevenLabs is generous but slows down. Free tiers cap lower.

The fix is simple: chunk by chapter, not by character count. Save each chapter as its own text file. Run each through the TTS tool separately, download the audio, and stitch them at the end. This gives you three benefits at once:

Natural pauses between chapters when you stitch — no awkward mid-sentence cutoffs.
Easy to re-do a single chapter if the voice glitches without re-rendering the whole book.
Clean track-list metadata for podcast or audiobook players.

Within a chapter, paste in 800 to 1,200 word chunks if your tool has a tight character limit. Add a sentence-final period after the last word in each chunk so the engine doesn't run the next request together with leftover intonation.

Cleaning the text first — the step nobody mentions

The single biggest difference between an amateur TTS audiobook and a listenable one is text cleanup before rendering. Three things to do:

Spell out unusual words and acronyms. NASA reads fine. NDA usually does not — it becomes "nudda." Replace with "N D A" with spaces. Same for product names and made-up fantasy terms.
Replace dashes used as pauses with commas or em-dashes correctly. A lone hyphen between words can be read as "minus" by some engines. Use proper em-dashes (—) or rephrase.
Strip footnote numbers, page references, and Markdown. "Smith argued³ that..." reads as "Smith argued three that." Remove or rewrite inline.

Twenty minutes of cleanup per chapter saves hours of re-recording.

Speed, pauses, and the breathing problem

Default TTS playback often sounds rushed for long-form listening. Audiobook narrators average about 150–160 words per minute. Most TTS at 1.0x runs faster than that.

Settings that hold up over hours of listening:

Speed: 0.95x to 1.05x. Slower if your voice has a brighter tone, slightly faster if it's a deeper voice.
Pauses: add a 1.5-second silence between paragraphs in your audio editor. Most TTS engines run paragraphs together, which is fine for productivity audio and exhausting for fiction.
Chapter breaks: 3 seconds of silence, then the chapter title spoken in the same voice as a separate clip, then 1 second of silence, then the chapter content.

This is the difference between an audiobook and a wall of speech. Read Aloud Reader makes the basics easy — paste, pick a voice, set speed, download the MP3 — and you handle the chapter stitching in Audacity. Our text-to-MP3 conversion guide walks through the export side.

Dialogue, multiple characters, and when to stop trying

One thing TTS still does badly: distinct character voices in dialogue-heavy fiction. You can switch voices manually for major characters, but it gets tedious past two or three.

Practical compromises:

For first-person novels with one narrator voice, just render the whole thing in one voice — readers expect that anyway.
For dual-POV novels, render each POV in a different voice and split by chapter.
For dense dialogue with ten characters, accept that TTS audiobooks won't fully replace human narration here. Consider a hybrid where you record dialogue lines and TTS handles narration.

Can you sell a TTS audiobook?

This part has rules. Audible's ACX program technically does not accept fully AI-narrated audiobooks for distribution as of writing — they require human narration. Other platforms are more flexible: Findaway Voices, Google Play Books, Spotify Open Audiobooks, and direct-on-your-website distribution allow TTS-narrated audiobooks, sometimes with disclosure required.

If you're producing for personal use, a podcast feed of your own writing, language-learning material, or a niche distribution channel, none of this matters. If you want Audible distribution, you need a human narrator.

A realistic timeline for a 60,000-word book

For a novel-length manuscript, expect:

2 hours of text cleanup and chapter splitting
3 to 4 hours of TTS rendering (mostly waiting on API calls or browser playback if you're using a free tool)
2 hours of stitching, silence padding, and chapter title clips
1 hour of QA listening to spot-check chapters

Total: about a working day. Compare to 30+ hours for a human-narrated equivalent and you can see why DIY TTS audiobooks have quietly exploded among indie authors and self-publishing nonfiction writers.

From book to audio without losing a weekend

Going from book to audio used to be a months-long project. Today's free tools handle the rendering; the real work is text cleanup, voice choice, and chapter pacing. If you want to create audiobook text to speech versions of your own writing for a podcast feed, a language-study companion, or just personal listening, try Read Aloud Reader on a single chapter first.

The smallest first project worth doing

If you've never done this, don't start with your novel. Take a single long blog post or a 5,000-word essay and run it end-to-end: clean the text, render with two voice choices, stitch with chapter pauses, listen back at audiobook speed. You'll learn the workflow in 90 minutes and your second project will be twice as good.