doccano logo

doccano

Collaborative web‑based text annotation for fast ML dataset creation

doccano lets teams label text for classification, NER, and summarization via an intuitive UI, supporting multiple languages, mobile access, and a REST API, deployable with pip, Docker, or Compose.

Overview

Overview

doccano is a web‑based annotation platform designed for machine‑learning practitioners and data‑labeling teams. It supports text classification, sequence labeling (NER), and sequence‑to‑sequence tasks such as summarization, allowing users to build high‑quality datasets in hours.

Capabilities & Deployment

The tool offers collaborative annotation, multi‑language and emoji support, a mobile‑friendly interface with dark theme, and a comprehensive RESTful API for automation. Installation is flexible: a simple pip install doccano, a ready‑made Docker image, or a full Docker‑Compose stack. For quick cloud setups, one‑click deployment options are available on AWS and Heroku.

Who Benefits

Whether you are a researcher prototyping a new NLP model, an enterprise needing on‑premise data labeling, or a small team creating custom datasets, doccano provides the core features and extensibility to fit your workflow.

Highlights

Collaborative real‑time annotation
Multi‑language and emoji support
Mobile‑friendly interface with dark theme
Comprehensive RESTful API for automation

Pros

  • Easy to install via pip or Docker
  • Supports classification, NER, and summarization tasks
  • Built‑in user management and role control
  • Extensible through open‑source contributions and API

Considerations

  • Requires Python 3.8+ for pip installation
  • Self‑hosting needed for full control
  • Limited out‑of‑the‑box analytics
  • No built‑in active learning loop

Managed products teams compare with

When teams consider doccano, these hosted platforms usually appear on the same shortlist.

Datasaur logo

Datasaur

NLP data labeling platform with AI-assisted automation, quality workflows, and private LLM options

SuperAnnotate logo

SuperAnnotate

AI data labeling & evaluation platform for images, video, text, audio, and more

Supervisely logo

Supervisely

Computer vision labeling platform for images, video, LiDAR, and medical with AI-assisted tools

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Small to medium teams building custom NLP datasets
  • Researchers prototyping annotation workflows
  • Enterprises needing on‑premise data labeling
  • Projects requiring multi‑language or emoji annotations

Not ideal when

  • Very large scale labeling requiring built‑in crowdsourcing platform
  • Teams that need native mobile apps beyond browser access
  • Users seeking automatic labeling or active learning features
  • Organizations requiring a fully managed SaaS solution

How teams use it

Sentiment analysis dataset creation

Rapidly label thousands of tweets with positive, neutral, or negative tags using the classification UI.

Named entity recognition for medical records

Annotators mark disease, medication, and procedure entities across multilingual clinical notes.

Text summarization training data

Create source‑summary pairs for seq2seq models by annotating long articles and their concise abstracts.

Customer support ticket categorization

Team collaboratively tags tickets, enabling supervised models to route queries automatically.

Tech snapshot

Python54%
Vue33%
TypeScript8%
JavaScript4%
Shell1%
Dockerfile1%

Tags

nuxttext-annotationvuemachine-learningdatasetpythonnuxtjsnatural-language-processingdatasetsvuejsdata-labelingannotation-tool

Frequently asked questions

How do I install doccano?

You can install via pip (`pip install doccano`), pull the Docker image, or use Docker Compose as described in the documentation.

Which database backends are supported?

SQLite is used by default; PostgreSQL can be enabled by installing `doccano[postgresql]` and setting the `DATABASE_URL` environment variable.

Can I use doccano on mobile devices?

Yes, the web UI is responsive and works on mobile browsers, and the tool includes explicit mobile support.

Is there an API for programmatic access?

doccano provides a full RESTful API for project, document, and annotation management.

What authentication methods are available?

doccano uses built‑in user accounts; you create a superuser with `doccano createuser` and can manage users per project.

Project at a glance

Active
Stars
10,497
Watchers
10,497
Forks
1,828
LicenseMIT
Repo age7 years old
Last commit5 days ago
Primary languagePython

Last synced 3 hours ago