Piper is an open-source neural text-to-speech engine built on VITS (Variational Inference Text-to-Speech). It generates natural, human-like speech directly in your browser without cloud processing.

How Piper VITS and WebAssembly power offline neural TTS in your browser

offline neural TTS with Piper VITS and WebAssembly

Your text shouldn't have to leave your device to be read aloud. Most browser TTS no cloud solutions either sound robotic or send every word you highlight to a remote server. Voice Reader takes a different approach: it runs Piper VITS through WebAssembly directly in Chrome, so speech synthesis happens on your CPU, not someone else's.

What Piper is

Piper is an open-source neural TTS engine built on VITS (Variational Inference Text-to-Speech), maintained by the Rhasspy project under an MIT license. VITS generates speech end-to-end from text in a single pass, rather than splicing together pre-recorded phonemes the way older engines do. The result is natural intonation and rhythm that concatenative systems can't match.

Each Piper voice ships as a single ONNX model file. Medium-quality English voices weigh around 63 MB. That's small enough to cache in a browser but large enough to produce speech that doesn't sound like a 2005 GPS. The code is auditable, no hidden data collection anywhere in the pipeline, which is why it fits well in a free offline TTS extension with no account required.

WebAssembly: how Piper runs in a browser

Piper's inference engine is C++. To run it in Chrome without a native binary, we compile it to WebAssembly using ONNX Runtime WASM. The browser executes this bytecode at near-native speed, fast enough for real-time synthesis on any modern laptop or desktop.

Three practical things follow from this. First, there are no network calls during synthesis, so your text never reaches a server. Second, the extension works on planes, in basements, anywhere without signal, once the model is cached. Third, the same WASM binary runs on Chrome, Edge, Brave, Opera, and Firefox without any platform-specific changes.

Voice Reader runs the Piper WASM engine inside a Manifest V3 offscreen document, which keeps synthesis off the main thread so the browser stays responsive while audio is generated.

How Voice Reader uses Piper + WASM

Select text on any webpage and right-click → Read.
Choose an engine mode: Auto (Piper with fallback), Neural (Piper only), or System (Web Speech API).
On first use, the selected Piper voice model downloads from HuggingFace and gets stored in your browser's Origin Private File System (OPFS), a sandboxed local storage area the extension can access without any cloud sync.
Piper's WASM engine synthesizes speech from your text on-device.
Audio plays through your speakers. The floating bar lets you pause, skip, or adjust speed.
Every read after that first download runs fully offline. No requests leave the browser.

First-run download and caching

The initial download of a medium-quality Piper voice takes 30 seconds to 2 minutes depending on connection speed. The ~63 MB ONNX model file goes into OPFS and stays there until you clear extension data. No HuggingFace account is needed. Once cached, the voice loads from local storage in under a second.

If you switch to a different voice ID, that voice downloads once on first use, then caches the same way. Most browsers allocate at least 50 GB of OPFS quota, so storing several voices isn't a problem in practice.

Engine modes and fallback behavior

Auto (default)

Tries Piper first. If the ONNX model fails to load or WebAssembly isn't supported, falls back to the system's Web Speech API voices. Good for most users who want the best available voice without thinking about it.

Neural (Piper only)

Always uses Piper. If the model can't load, you see an error rather than a silent fallback. Use this mode when you need consistent voice output and don't want the system voices substituting in.

System (Web Speech API)

Uses your browser's built-in voices. No model download, no WASM overhead, instant playback. The tradeoff is voice quality and the fact that some system voices do phone home depending on your OS. Details in our guide on how to make Chrome read text out loud.

Switching voices and models

Voice Reader ships with dozens of Piper voices across English (20+ IDs covering different accents and genders), Spanish, French, German, Italian, Portuguese, and more. Quality tiers run from 16 MB low to 63-75 MB medium to 120+ MB high. To switch:

Click the Voice Reader icon → Options.
Set Engine to Neural (Piper).
Pick a voice ID from the Neural voice dropdown.
Save. The new voice downloads on next use.

Comparison: Piper vs. cloud TTS vs. system voices

Aspect	Piper (WASM)	Cloud TTS	System voices
Quality	High (neural VITS)	Very high (premium)	Medium (pre-recorded)
Offline	Yes (after download)	No	Yes
Cost	Free	$0.003-0.02 per 1K chars	Free
Privacy	Local only	Text sent to server	Local only (usually)
Voice variety	100+ voices	500+ voices	Depends on OS
Setup	Install extension, download model once	API key + payment method	Install extension, instant

Why this matters for accessibility

Users with dyslexia, low vision, or reading fatigue need reliable read-aloud support, not a tool that breaks when there's no wifi or stops working after a free-tier limit. Piper running offline through WebAssembly means the voice is always available, sounds natural, and never processes your text on a third-party server. That's the case made for read-aloud support for dyslexia and reading difficulties that also respects privacy.

Performance and browser support

Chrome, Edge, Brave, and Opera all support WebAssembly + OPFS fully. Firefox works. Safari has partial OPFS support from iOS/macOS 17.4 onward. On the hardware side, expect the WASM engine to synthesize roughly 1-2 seconds of audio per second of real time on a modern multi-core CPU, with about 200-300 MB RAM used while a model is loaded.

Known limitations

Piper requires the full text block before it starts generating; streaming synthesis sentence-by-sentence is not yet implemented.
Prosody control (adjusting emphasis or emotional tone) is limited compared to cloud providers; the System mode gives you finer pitch and rate control through Web Speech API.
The 63 MB model size is manageable on desktop but can feel heavy on lower-storage devices.

Getting started

Install Voice Reader as an unpacked extension.
Open extension options, set Engine to Neural (Piper), pick a voice.
Select text on any page and right-click → Read.
Wait 30-120 seconds for the first voice download. Every read after that is instant and offline.

No account. No subscription. No data collection. The text stays on your device from the moment you highlight it to the moment you hear it.

FAQ

What is Piper TTS?

Piper is an open-source neural TTS engine built on VITS. It generates speech on-device without cloud processing. The Rhasspy project maintains it under an MIT license.

Does Piper TTS work offline?

Yes. After one download (30 seconds to 2 minutes), it runs entirely through WebAssembly on your device. No internet needed for any subsequent read.

How much data does Piper TTS send to the cloud?

Zero. Text and audio never leave the browser. The only network call is the one-time model download from HuggingFace.

How do I switch Piper voices?

Open Voice Reader options, select Neural (Piper) as the engine, choose a voice ID, and save. The new voice downloads on next use.

Can I use Piper on mobile?

Voice Reader is a Chrome extension for desktop (Windows, Mac, Linux). Chrome extensions don't run on mobile Chrome.

Is Piper better than my browser's built-in voices?

For most system voices, yes: Piper's VITS model produces better prosody and more natural rhythm. If you want zero setup time and don't care about voice quality, System mode is faster to start. Auto mode gives you Piper when the model is cached and falls back to system voices if it isn't.