auto-sklearn

Hands-free AutoML that plugs directly into scikit-learn

auto-sklearn provides automated model selection and hyperparameter optimization as a drop-in scikit-learn estimator, leveraging Bayesian optimization and meta-learning for fast, robust pipelines.

Overview

auto-sklearn is a Python toolkit that turns the tedious process of model selection, hyperparameter tuning, and ensemble construction into a single, scikit-learn‑compatible estimator. By simply importing AutoSklearnClassifier or AutoSklearnRegressor, users can fit powerful models with just a few lines of code.

Capabilities

The library combines Bayesian optimization (via SMAC) with meta‑learning from prior tasks to warm‑start searches, dramatically reducing the time needed to find high‑performing configurations. It automatically builds ensembles of the best models, improving robustness and generalization. All of this works within the familiar fit/predict API, making it easy to integrate into existing pipelines, notebooks, or production code.

Deployment

auto-sklearn runs on standard Python environments and requires only a CPU (GPU optional for underlying estimators). It is released under a BSD‑3‑Clause license, with extensive documentation, examples, and a growing community on GitHub. Suitable for research, rapid prototyping, and production‑grade AutoML workflows.

Highlights

Bayesian optimization with SMAC for efficient hyperparameter search

Meta‑learning from previous datasets to warm‑start searches

Drop‑in scikit‑learn API (fit, predict, transform)

Automatic ensemble construction for robust predictions

Pros

Easy integration with existing scikit-learn code
State‑of‑the‑art AutoML performance
Supports both classification and regression out of the box
Extensible via custom components

Considerations

Higher computational cost than manual tuning
Limited to algorithms supported by scikit-learn
Requires sufficient time and resources for search
Less transparent model selection process

Managed products teams compare with

When teams consider auto-sklearn, these hosted platforms usually appear on the same shortlist.

Azure Machine Learning

Cloud service for accelerating and managing the machine learning project lifecycle, including training and deployment of models

Vertex AI

Unified ML platform for training, tuning, and deploying models

H2O Driverless AI

Automated machine learning platform for building AI models without coding

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Data scientists needing quick baseline models
Researchers wanting reproducible AutoML pipelines
Teams with limited ML engineering resources
Projects where ensemble robustness is critical

Not ideal when

Real-time inference with strict latency constraints
Very small datasets where over‑search adds noise
Environments lacking sufficient CPU/GPU resources
Users requiring full control over every algorithmic step

How teams use it

Rapid prototyping of classification models

Generate a performant classifier in minutes without manual hyperparameter tuning

Automated preprocessing for tabular data

Leverage built‑in feature engineering pipelines to produce ready‑to‑use models

Benchmarking multiple algorithms across datasets

Obtain comparative performance reports automatically

Deploying robust ensembles in production

Create ensembles that improve generalization and reduce variance

Tech snapshot

Python100%

Shell1%

Makefile1%

Dockerfile1%

Frequently asked questions

How does auto-sklearn differ from manual scikit-learn usage?

It automates model selection, hyperparameter optimization, and ensemble building while exposing the same fit/predict interface.

What types of problems are supported?

Classification and regression on tabular data using any scikit-learn estimator.

Do I need a GPU to run auto-sklearn?

No, it runs on CPU; GPU can accelerate underlying models if they support it.

How does meta‑learning improve the search?

It uses prior runs on similar datasets to suggest promising configurations, reducing search time.

Is the library actively maintained?

Yes, with regular releases, documentation, and a BSD‑3‑Clause license on GitHub.

Project at a glance

Active

Visit site View repo

Stars: 8,063
Watchers: 8,063
Forks: 1,316

LicenseBSD-3-Clause

Repo age10 years old

Last commit2 months ago

Primary languagePython

Last synced 3 hours ago

Overview

Overview

Capabilities

Deployment

Highlights

Pros

Considerations

Managed products teams compare with

Azure Machine Learning

Vertex AI

H2O Driverless AI

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions