WhisperLive

Near‑real‑time speech transcription with flexible AI backends

WhisperLive streams audio to text instantly, supporting microphone, file, RTSP/HLS inputs, multiple inference backends, Docker deployment, and optional translation.

Overview

WhisperLive streams audio to text with near‑real‑time latency. It works with live microphone input, local audio files, and network streams (RTSP, HLS), automatically handling language detection and optional translation.

Flexible Deployment

The server supports three inference backends—faster_whisper for CPU, TensorRT for NVIDIA GPUs, and OpenVINO for Intel CPUs/GPUs—allowing you to choose the best performance for your hardware. Docker images simplify GPU setup, while native execution is possible with the appropriate drivers. Client configuration lets you control model size, VAD, recording, and translation features.

Scalability and Customization

You can limit concurrent users with --max_clients and set connection timeouts. The server can instantiate a separate model per client or share a single model to reduce RAM usage. Environment variables like OMP_NUM_THREADS let you fine‑tune CPU threading.

Highlights

Multi‑backend inference (faster_whisper, TensorRT, OpenVINO) for CPU/GPU optimization

Supports microphone, file, RTSP and HLS streams with optional VAD

Dockerized deployment with automatic GPU support

Built‑in language detection, translation, and customizable client options

Pros

Near real‑time transcription latency
Flexible backend selection for varied hardware
Extensible Python client API
Docker support simplifies GPU environment setup

Considerations

Optimal performance requires proper GPU driver and backend installation
Per‑client model loading can increase RAM usage if not using single‑model mode
Limited to Whisper model capabilities
No graphical UI; integration requires custom client development

Managed products teams compare with

When teams consider WhisperLive, these hosted platforms usually appear on the same shortlist.

Otter.ai

AI meeting assistant for transcription and automated note-taking

SuperWhisper

Real-time transcription and translation API

Willow

Voice AI and speech recognition technology

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Developers building programmable live transcription services
Teams needing captions for webinars, meetings, or streams
Researchers comparing inference backends on different hardware
Multilingual content creators requiring on‑the‑fly translation

Not ideal when

Non‑technical users looking for a ready‑made UI
Environments without compatible GPU or Intel drivers
Projects demanding ultra‑low latency beyond near‑real‑time
Use cases requiring extensive post‑processing beyond raw text

How teams use it

Live meeting captioning

Generate real‑time subtitles for virtual conferences and team calls

Streaming broadcast transcription

Provide captions for RTSP/HLS video streams in live broadcasts

Multilingual podcast translation

Transcribe and translate episodes into English or other target languages

Offline audio archiving

Batch process recorded audio files into searchable text transcripts

Tech snapshot

Python72%

JavaScript15%

Swift7%

HTML4%

Shell2%

CSS1%

Frequently asked questions

Which backend gives the best performance?

TensorRT usually offers the highest throughput on NVIDIA GPUs, OpenVINO is optimized for Intel CPUs/GPUs, and faster_whisper works well on CPUs.

Do I need a GPU?

A GPU is not required; CPU inference works via faster_whisper, but GPU backends provide faster results.

How do I enable translation?

Set `enable_translation=True` and specify `target_language` in the client; the server will run a translation thread.

Can I run WhisperLive without Docker?

Yes, you can run natively after installing the required drivers and runtimes for the chosen backend.

What audio sources are supported?

Microphone input, local audio files, RTSP streams, and HLS streams are all supported.

Project at a glance

Active

View repo

Stars: 3,864
Watchers: 3,864
Forks: 533

LicenseMIT

Repo age2 years old

Last commit2 weeks ago

Primary languagePython

Last synced 2 hours ago

Overview

Overview

Flexible Deployment

Scalability and Customization

Highlights

Pros

Considerations

Managed products teams compare with

Otter.ai

SuperWhisper

Willow

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions