
Datasaur
NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options
Discover top open-source software, updated regularly with real-world adoption signals.

Collaborative web‑based text annotation for fast ML dataset creation
doccano lets teams label text for classification, NER, and summarization via an intuitive UI, supporting multiple languages, mobile access, and a REST API, deployable with pip, Docker, or Compose.
doccano is a web‑based annotation platform designed for machine‑learning practitioners and data‑labeling teams. It supports text classification, sequence labeling (NER), and sequence‑to‑sequence tasks such as summarization, allowing users to build high‑quality datasets in hours.
The tool offers collaborative annotation, multi‑language and emoji support, a mobile‑friendly interface with dark theme, and a comprehensive RESTful API for automation. Installation is flexible: a simple , a ready‑made Docker image, or a full Docker‑Compose stack. For quick cloud setups, one‑click deployment options are available on AWS and Heroku.
pip install doccanoWhether you are a researcher prototyping a new NLP model, an enterprise needing on‑premise data labeling, or a small team creating custom datasets, doccano provides the core features and extensibility to fit your workflow.
When teams consider doccano, these hosted platforms usually appear on the same shortlist.

NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options

AI data labeling & evaluation platform for images, video, text, audio, and more

Computer vision labeling platform for images, video, LiDAR, and medical with AI-assisted tools
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Sentiment analysis dataset creation
Rapidly label thousands of tweets with positive, neutral, or negative tags using the classification UI.
Named entity recognition for medical records
Annotators mark disease, medication, and procedure entities across multilingual clinical notes.
Text summarization training data
Create source‑summary pairs for seq2seq models by annotating long articles and their concise abstracts.
Customer support ticket categorization
Team collaboratively tags tickets, enabling supervised models to route queries automatically.
You can install via pip (`pip install doccano`), pull the Docker image, or use Docker Compose as described in the documentation.
SQLite is used by default; PostgreSQL can be enabled by installing `doccano[postgresql]` and setting the `DATABASE_URL` environment variable.
Yes, the web UI is responsive and works on mobile browsers, and the tool includes explicit mobile support.
doccano provides a full RESTful API for project, document, and annotation management.
doccano uses built‑in user accounts; you create a superuser with `doccano createuser` and can manage users per project.
Project at a glance
StableLast synced 4 days ago