
Datasaur
NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options
Discover top open-source software, updated regularly with real-world adoption signals.

Collaborative platform for building high-quality AI datasets
Argilla lets AI engineers and domain experts collaboratively create, annotate, and manage high‑quality datasets for NLP, LLM, and multimodal projects, with easy deployment on Hugging Face Spaces.

Argilla is a collaboration tool designed for AI teams and domain experts who need to build and maintain high‑quality datasets. It supports a range of annotation tasks—from classic text classification and NER to LLM preference tuning and multimodal labeling—allowing users to iterate quickly on data and models.
The platform offers a programmatic workflow via a Python SDK, enabling automated dataset creation, record logging, and annotation retrieval. Users can launch a ready‑to‑use Argilla server with a single click on Hugging Face Spaces, or self‑host using Docker for full control. Integrated features such as filters, AI‑generated feedback suggestions, and semantic search make data curation efficient and engaging.
Argilla is community‑driven, with active Discord channels, bi‑weekly meetups, and contributions from organizations like the Red Cross and Prolific. While the core codebase is stable and bug‑fixes continue, no new features are planned, making it a reliable foundation for teams that can extend it themselves.
When teams consider Argilla, these hosted platforms usually appear on the same shortlist.

NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options

AI data labeling & evaluation platform for images, video, text, audio, and more

Computer vision labeling platform for images, video, LiDAR, and medical with AI-assisted tools
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Refugee request triage for humanitarian aid
Domain experts classified incoming messages, enabling the Red Cross to route assistance faster and improve response accuracy.
Customer support multi‑label classification
Loris.ai generated labeled samples via contrastive learning, reducing annotation time and boosting classifier performance.
Research data collection for academic studies
Prolific distributed annotation tasks to a workforce, gathering high‑quality labeled data at scale.
Fine‑tuning LLMs with curated feedback
Teams built preference‑tuned datasets (e.g., UltraFeedback) leading to higher benchmark scores for models like Notus.
You can launch the Argilla server with a single click on Hugging Face Spaces or self‑host using the provided Docker image.
Core code is stable; maintainers will continue to issue bug‑fixes and security patches, but no new features are planned.
Text classification, named‑entity recognition, RAG, preference tuning, and multimodal tasks such as text‑to‑image labeling.
Yes, the Python SDK lets you create datasets, log records, and retrieve annotations programmatically.
Join the Argilla Discord, attend bi‑weekly meetups, or follow the project on Twitter and LinkedIn for support.
Project at a glance
ActiveLast synced 4 days ago