WhisperLive logo

WhisperLive

Near‑real‑time speech transcription with flexible AI backends

WhisperLive streams audio to text instantly, supporting microphone, file, RTSP/HLS inputs, multiple inference backends, Docker deployment, and optional translation.

Overview

Overview

WhisperLive streams audio to text with near‑real‑time latency. It works with live microphone input, local audio files, and network streams (RTSP, HLS), automatically handling language detection and optional translation.

Flexible Deployment

The server supports three inference backends—faster_whisper for CPU, TensorRT for NVIDIA GPUs, and OpenVINO for Intel CPUs/GPUs—allowing you to choose the best performance for your hardware. Docker images simplify GPU setup, while native execution is possible with the appropriate drivers. Client configuration lets you control model size, VAD, recording, and translation features.

Scalability and Customization

You can limit concurrent users with --max_clients and set connection timeouts. The server can instantiate a separate model per client or share a single model to reduce RAM usage. Environment variables like OMP_NUM_THREADS let you fine‑tune CPU threading.

Highlights

Multi‑backend inference (faster_whisper, TensorRT, OpenVINO) for CPU/GPU optimization
Supports microphone, file, RTSP and HLS streams with optional VAD
Dockerized deployment with automatic GPU support
Built‑in language detection, translation, and customizable client options

Pros

  • Near real‑time transcription latency
  • Flexible backend selection for varied hardware
  • Extensible Python client API
  • Docker support simplifies GPU environment setup

Considerations

  • Optimal performance requires proper GPU driver and backend installation
  • Per‑client model loading can increase RAM usage if not using single‑model mode
  • Limited to Whisper model capabilities
  • No graphical UI; integration requires custom client development

Managed products teams compare with

When teams consider WhisperLive, these hosted platforms usually appear on the same shortlist.

Otter.ai logo

Otter.ai

AI meeting assistant for transcription and automated note-taking

SuperWhisper logo

SuperWhisper

Real-time transcription and translation API

Willow logo

Willow

Voice AI and speech recognition technology

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Developers building programmable live transcription services
  • Teams needing captions for webinars, meetings, or streams
  • Researchers comparing inference backends on different hardware
  • Multilingual content creators requiring on‑the‑fly translation

Not ideal when

  • Non‑technical users looking for a ready‑made UI
  • Environments without compatible GPU or Intel drivers
  • Projects demanding ultra‑low latency beyond near‑real‑time
  • Use cases requiring extensive post‑processing beyond raw text

How teams use it

Live meeting captioning

Generate real‑time subtitles for virtual conferences and team calls

Streaming broadcast transcription

Provide captions for RTSP/HLS video streams in live broadcasts

Multilingual podcast translation

Transcribe and translate episodes into English or other target languages

Offline audio archiving

Batch process recorded audio files into searchable text transcripts

Tech snapshot

Python72%
JavaScript15%
Swift7%
HTML4%
Shell2%
CSS1%

Tags

text-to-speechopenvinotensorrtvoice-recognitionopenvino-inteldictationwhispertensorrt-llmwhisper-tensorrtobstranslationopenai

Frequently asked questions

Which backend gives the best performance?

TensorRT usually offers the highest throughput on NVIDIA GPUs, OpenVINO is optimized for Intel CPUs/GPUs, and faster_whisper works well on CPUs.

Do I need a GPU?

A GPU is not required; CPU inference works via faster_whisper, but GPU backends provide faster results.

How do I enable translation?

Set `enable_translation=True` and specify `target_language` in the client; the server will run a translation thread.

Can I run WhisperLive without Docker?

Yes, you can run natively after installing the required drivers and runtimes for the chosen backend.

What audio sources are supported?

Microphone input, local audio files, RTSP streams, and HLS streams are all supported.

Project at a glance

Active
Stars
3,755
Watchers
3,755
Forks
512
LicenseMIT
Repo age2 years old
Last commitlast week
Primary languagePython

Last synced 3 hours ago