
Datasaur
NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options
Discover top open-source software, updated regularly with real-world adoption signals.

Flexible, multi-type data labeling platform for modern ML pipelines.
Label Studio lets teams annotate images, audio, text, video, and time-series through an intuitive UI, supporting cloud storage, multi-user projects, and seamless ML model integration.

Label Studio is a flexible annotation platform that supports images, audio, text, video, HTML and time-series data. It provides a clean web UI, configurable label formats, and built-in templates, allowing data scientists, ML engineers and annotation teams to create high-quality training sets or refine existing ones. Projects can be organized per team, with multi-user sign-in and role-based access, while imports from local files or cloud storage (AWS S3, Google Cloud) streamline data ingestion. Integration with the Machine Learning SDK lets you connect any model for pre-labeling and prediction comparison, and a REST API enables embedding the service into automated pipelines.
Label Studio can be run locally via Docker, Docker-Compose (including optional Nginx, PostgreSQL, and MinIO for S3-compatible storage), pip, Poetry or Anaconda, and it also offers one-click cloud deployments on Heroku, Azure or GCP. A free Starter Cloud trial is available for teams that prefer a managed environment. The open-source nature lets you host the platform on-premise for full data control while leveraging community-driven extensions and documentation.
When teams consider Label Studio, these hosted platforms usually appear on the same shortlist.

NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options

AI data labeling & evaluation platform for images, video, text, audio, and more

Computer vision labeling platform for images, video, LiDAR, and medical with AI-assisted tools
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Image classification dataset creation
Annotators label thousands of images via web UI and export COCO format for training a vision model.
Audio transcription for speech recognition
Team tags speech segments, aligns timestamps, and feeds labeled data to improve ASR accuracy.
Pre-labeling with existing model
Connect a model via the ML SDK, auto-populate predictions, and have humans correct them, accelerating dataset refinement.
Integrating labeling into CI pipeline
REST API triggers labeling jobs from data ingestion scripts, enabling automated data quality loops.
You can install via Docker, Docker‑Compose, pip, Poetry, Anaconda, or run a one‑click cloud deployment on Heroku, Azure, or GCP.
Label Studio accepts images, audio, video, text, HTML, time-series, and can import from local files or cloud storage (AWS S3, Google Cloud) in JSON, CSV, TSV, RAR, ZIP archives.
Yes, the Machine Learning SDK lets you connect any model for pre‑labeling, prediction comparison, and active learning workflows.
A free Starter Cloud trial is offered, and you can also deploy the open‑source version to cloud providers like Heroku, Azure, or GCP.
Users sign up and log in; annotations are tied to their accounts, and you can organize work into multiple projects with role‑based permissions.
Project at a glance
ActiveLast synced 4 days ago