auto-sklearn logo

auto-sklearn

Hands-free AutoML that plugs directly into scikit-learn

auto-sklearn provides automated model selection and hyperparameter optimization as a drop-in scikit-learn estimator, leveraging Bayesian optimization and meta-learning for fast, robust pipelines.

auto-sklearn banner

Overview

Overview

auto-sklearn is a Python toolkit that turns the tedious process of model selection, hyperparameter tuning, and ensemble construction into a single, scikit-learn‑compatible estimator. By simply importing AutoSklearnClassifier or AutoSklearnRegressor, users can fit powerful models with just a few lines of code.

Capabilities

The library combines Bayesian optimization (via SMAC) with meta‑learning from prior tasks to warm‑start searches, dramatically reducing the time needed to find high‑performing configurations. It automatically builds ensembles of the best models, improving robustness and generalization. All of this works within the familiar fit/predict API, making it easy to integrate into existing pipelines, notebooks, or production code.

Deployment

auto-sklearn runs on standard Python environments and requires only a CPU (GPU optional for underlying estimators). It is released under a BSD‑3‑Clause license, with extensive documentation, examples, and a growing community on GitHub. Suitable for research, rapid prototyping, and production‑grade AutoML workflows.

Highlights

Bayesian optimization with SMAC for efficient hyperparameter search
Meta‑learning from previous datasets to warm‑start searches
Drop‑in scikit‑learn API (fit, predict, transform)
Automatic ensemble construction for robust predictions

Pros

  • Easy integration with existing scikit-learn code
  • State‑of‑the‑art AutoML performance
  • Supports both classification and regression out of the box
  • Extensible via custom components

Considerations

  • Higher computational cost than manual tuning
  • Limited to algorithms supported by scikit-learn
  • Requires sufficient time and resources for search
  • Less transparent model selection process

Managed products teams compare with

When teams consider auto-sklearn, these hosted platforms usually appear on the same shortlist.

Azure Machine Learning logo

Azure Machine Learning

Cloud service for accelerating and managing the machine learning project lifecycle, including training and deployment of models

H2O Driverless AI logo

H2O Driverless AI

Automated machine learning platform for building AI models without coding

Vertex AI logo

Vertex AI

Unified ML platform for training, tuning, and deploying models

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Data scientists needing quick baseline models
  • Researchers wanting reproducible AutoML pipelines
  • Teams with limited ML engineering resources
  • Projects where ensemble robustness is critical

Not ideal when

  • Real-time inference with strict latency constraints
  • Very small datasets where over‑search adds noise
  • Environments lacking sufficient CPU/GPU resources
  • Users requiring full control over every algorithmic step

How teams use it

Rapid prototyping of classification models

Generate a performant classifier in minutes without manual hyperparameter tuning

Automated preprocessing for tabular data

Leverage built‑in feature engineering pipelines to produce ready‑to‑use models

Benchmarking multiple algorithms across datasets

Obtain comparative performance reports automatically

Deploying robust ensembles in production

Create ensembles that improve generalization and reduce variance

Tech snapshot

Python100%
Shell1%
Makefile1%
Dockerfile1%

Tags

meta-learningautomlhyperparameter-optimizationscikit-learnhyperparameter-searchhyperparameter-tuningbayesian-optimizationautomated-machine-learningmetalearningsmac

Frequently asked questions

How does auto-sklearn differ from manual scikit-learn usage?

It automates model selection, hyperparameter optimization, and ensemble building while exposing the same fit/predict interface.

What types of problems are supported?

Classification and regression on tabular data using any scikit-learn estimator.

Do I need a GPU to run auto-sklearn?

No, it runs on CPU; GPU can accelerate underlying models if they support it.

How does meta‑learning improve the search?

It uses prior runs on similar datasets to suggest promising configurations, reducing search time.

Is the library actively maintained?

Yes, with regular releases, documentation, and a BSD‑3‑Clause license on GitHub.

Project at a glance

Active
Stars
8,039
Watchers
8,039
Forks
1,316
LicenseBSD-3-Clause
Repo age10 years old
Last commityesterday
Primary languagePython

Last synced yesterday