
Otter.ai
AI meeting assistant for transcription and automated note-taking
Discover top open-source software, updated regularly with real-world adoption signals.

Fast, word-level ASR with speaker diarization and 70× realtime speed
WhisperX delivers rapid automatic speech recognition with precise word-level timestamps, speaker diarization, and GPU-efficient batching, supporting large-v2 models on modest hardware.
When teams consider WhisperX, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Live meeting transcription
Generate near‑real‑time captions with speaker names for Zoom, Teams, or Google Meet recordings.
Podcast post‑production
Create word‑accurate transcripts and speaker labels to streamline editing and searchable archives.
Academic lecture indexing
Produce timestamped transcripts for automatic subtitle generation and topic navigation.
AI research data preparation
Batch‑process large audio corpora with precise alignment for training downstream models.
A GPU with CUDA 12.8 and at least 8 GB of memory; CPU‑only works but is much slower.
It uses forced phoneme alignment with a wav2vec2 ASR model after the initial Whisper transcription.
Yes, when you provide a Hugging Face token for the required pyannote‑audio models; the system assigns speaker IDs.
You can run on CPU, but you will lose the 70× speed advantage and experience longer processing times.
WhisperX works with the standard Whisper models (small, base, large, large‑v2); larger models improve accuracy but require more GPU memory.
Project at a glance
ActiveLast synced 4 days ago