Lightning-Fast, On-Device Multilingual TTS

Running natively via ONNX. No cloud. No API calls. Complete privacy.

Blazing Fast 31 Languages Edge Ready Privacy First
8,725
Stars on GitHub
31
Languages Supported
99M
Parameters
12+
SDKs Available

Powerful Features

Blazing Fast

Low-latency, real-time synthesis across desktop, browser, mobile, and edge. Fast enough to turn an entire webpage into audio in under a second.

  • Real-time processing
  • Sub-second latency
  • Optimized for performance

31-Language Multilingual

Synthesize directly from text across 31 languages, or pass lang="na" for language-agnostic processing.

  • No language adapters needed
  • Automatic language detection
  • Native language support

Compact & Efficient

99M-parameter open-weight model - a fraction of the size of larger TTS systems for faster downloads and lower memory footprint.

  • Smaller model size
  • Faster cold starts
  • Lower memory usage

Edge-Device Ready

Runs locally on desktop, mobile, browsers, and resource-constrained hardware like Raspberry Pi or e-readers with zero network dependency.

  • No GPU required
  • Complete privacy
  • Offline capable

Studio-Quality Audio

Outputs studio-grade 44.1kHz 16-bit WAV directly, ready for production playback without any external upsampler.

  • 44.1kHz output
  • 16-bit precision
  • Production ready

Expression Tags

10 inline tags (e.g. <laugh>, <breath>, <sigh>) bring natural human nuance into generated speech.

  • No prompt engineering
  • Reference audio free
  • Natural expression

Superior Performance

Reading Accuracy

Evaluated on the Minimax-MLS-test benchmark, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models while preserving a lightweight on-device deployment path.

Average WER across all languages
2.5
%
Performance Chart

Runtime Efficiency

Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory.

CPU Inference Speed 3-5x faster than GPU baselines
Memory Usage ~2GB RAM
Model Size ~500MB ONNX

Model Size Comparison

At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems.

Supertonic 3 99M
Open TTS Systems 700M-2B+

31 Languages Supported

Not sure which language your text is in? Pass lang="na" and Supertonic will handle it automatically.

Arabic Bulgarian Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Polish Portuguese Romanian Russian Slovak Slovenian Spanish Swedish Turkish Ukrainian Vietnamese

Quick Start Guide

Install the Python SDK

On the first run, Supertonic downloads the model assets automatically.

pip install supertonic

Python Example

from supertonic import TTS

# First run downloads the model automatically
tts = TTS(auto_download=True)

style = tts.get_voice_style(voice_name="M1")

text = "Supertonic is a lightning fast, on-device TTS system."

wav, duration = tts.synthesize(
    text=text,
    lang="en",                      # Language code
    voice_style=style,              # Voice style
    total_steps=8,                  # Quality: 5-12
    speed=1.05,                     # Speed: 0.7-2.0
)

tts.save_audio(wav, "output.wav")
print(f"Generated {duration[0]:.2f}s of audio")

Local HTTP Server

Run Supertonic as a local HTTP service for integration with other tools.

pip install 'supertonic[serve]'
supertonic serve --host 127.0.0.1 --port 7788

Available Endpoints:

  • POST /v1/tts (Native)
  • POST /v1/audio/speech (OpenAI compatible)
  • GET /docs (API Documentation)

Live Demos & Use Cases

Raspberry Pi Demo

Real-time text-to-speech on Raspberry Pi, demonstrating on-device performance.

Watch Demo

E-Reader Integration

Experience Supertonic on an Onyx Boox e-reader in airplane mode with zero network dependency.

Watch Demo

Chrome Extension

Turn any webpage into audio in under one second with complete privacy.

Install Extension

Interactive Demo

Try Supertonic directly in your browser with our interactive demo. Experience real-time synthesis with different languages and voice styles.

Multi-Runtime SDKs

Ready-to-use examples through ONNX Runtime across multiple platforms.

Python

ONNX Runtime inference with pip installation

View Documentation →

Node.js

Server-side JavaScript implementation

View Example →

Browser

WebGPU/WASM inference in the browser

View Example →

Java

Cross-platform JVM implementation

View Example →

C++

High-performance C++ implementation

View Example →

C#

.NET ecosystem implementation

View Example →

Go

Go implementation with ONNX Runtime

View Example →

Swift

macOS and iOS applications

View Example →

Complete SDK List

Additional SDKs available for Rust, iOS, and Flutter platforms.

Rust

Memory-safe implementation

iOS

Native iOS apps

Flutter

Cross-platform apps

Voice Cloning Made Simple

Voice Builder

Turn your voice into a deployable, edge-native TTS with permanent ownership. Create custom voice profiles for both Supertonic 2 and Supertonic 3.

  • Permanent custom voice profile
  • Version-specific JSON files
  • Complete ownership
  • Easy integration
Create Your Voice
Voice Builder

For Commercial Use

Need more voices or enterprise features? Check out our commercial offerings:

Built With Supertonic

TLDRL

Free, on-device TTS extension for reading any webpage

Read Aloud

Open-source TTS browser extension

PageEcho

E-Book reader app for iOS

VoiceChat

On-device voice-to-voice LLM chatbot in the browser

OmniAvatar

Talking avatar video generator from photo + speech

CopiloTTS

Kotlin Multiplatform TTS SDK via ONNX Runtime

Choose Your Version

Latest

Supertonic 3

  • 31 Languages
  • Expression Tags
  • Improved Accuracy
  • 99M Parameters
Stable

Supertonic 2

  • 5 Languages
  • Production Ready
  • 66M Parameters
  • Backward Compatible
Legacy

Supertonic 1

  • 1 Language (EN)
  • Basic Features
  • 66M Parameters
  • Deprecated

Ready to Experience Lightning-Fast TTS?

Join thousands of developers who trust Supertonic for their on-device text-to-speech needs.