MLJAR

Automated, transparent machine learning for tabular data in minutes

mljar-supervised automates preprocessing, model selection, hyper‑parameter tuning, and reporting for tabular datasets, delivering transparent pipelines and visual explanations in minutes.

Overview

mljar-supervised is a Python package that streamlines the end‑to‑end workflow for tabular machine‑learning projects. It targets data scientists, analysts, and developers who need fast baselines, thorough model comparisons, and clear documentation without writing extensive boilerplate code.

Capabilities

The library offers four built‑in modes—Explain, Perform, Compete, and Optuna—each tuned for different goals such as data exploration, production‑ready pipelines, competition‑level performance, or exhaustive hyper‑parameter search. It automatically handles missing values, categorical encoding, and advanced feature engineering (e.g., golden features, text and time transforms). A wide algorithm suite (Linear, Decision Tree, Random Forest, LightGBM, XGBoost, CatBoost, Neural Networks, etc.) is combined with greedy ensembling and optional stacking. Every run generates a detailed Markdown report with learning curves, feature importance, SHAP visualizations, and model metrics, enabling reproducibility and auditability. The optional web‑app provides a code‑free GUI for secure local execution.

Highlights

Four purpose‑driven modes (Explain, Perform, Compete, Optuna)

Automatic preprocessing, feature engineering, and hyper‑parameter tuning

Greedy ensembling and optional stacking for top performance

Comprehensive Markdown reports with visual explanations

Pros

Speeds up model development for tabular data
Provides transparent pipelines and detailed documentation
Built‑in explainability with SHAP and decision‑tree visualizations
Supports a broad range of algorithms and ensembling

Considerations

Focused on tabular data; not suitable for image or audio tasks
Optuna mode can be computationally intensive for large datasets
Requires Python environment and compatible libraries (e.g., LightGBM)
Advanced feature engineering may increase runtime on very large data

Managed products teams compare with

When teams consider MLJAR, these hosted platforms usually appear on the same shortlist.

Azure Machine Learning

Cloud service for accelerating and managing the machine learning project lifecycle, including training and deployment of models

Vertex AI

Unified ML platform for training, tuning, and deploying models

H2O Driverless AI

Automated machine learning platform for building AI models without coding

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Rapid prototyping of predictive models on structured data
Regulatory or business audits that need model explainability
Machine‑learning competition participants seeking stacked ensembles
Teams that want automated, reproducible reporting for recurring analyses

Not ideal when

Computer‑vision or natural‑language processing projects
Real‑time streaming inference with strict latency constraints
Datasets that exceed available memory without out‑of‑core support
Users requiring custom deep‑learning architectures beyond provided models

How teams use it

Quick baseline generation

Produces a set of candidate models with performance metrics and a ready‑to‑use report within minutes.

Explainability audit for compliance

Delivers SHAP plots, decision‑tree visualizations, and feature‑importance charts to satisfy regulatory review.

ML competition entry

Creates a stacked ensemble with cross‑validated scores, maximizing leaderboard performance.

Automated monthly analysis

Generates reproducible Markdown reports for each run, enabling consistent documentation across cycles.

Tech snapshot

Python100%

Frequently asked questions

What types of data does mljar-supervised support?

It works with tabular datasets containing numeric, categorical, text, and time‑series features.

How do I install the package?

Run `pip install mljar-supervised` in your Python environment.

Can I use the library without an internet connection?

Yes, all training and reporting can be performed locally; the optional web UI runs on your machine.

What are the available AutoML modes?

Explain, Perform, Compete, and Optuna, each optimized for exploration, production, competition, or exhaustive tuning.

How are models evaluated?

Depending on the mode, the library uses train/test splits or k‑fold cross‑validation and reports metrics such as accuracy, F1, ROC‑AUC, and more.

Project at a glance

Stable

Visit site View repo

Stars: 3,245
Watchers: 3,245
Forks: 430

LicenseMIT

Repo age7 years old

Last commit8 months ago

Primary languagePython

Last synced 2 days ago

Overview

Overview

Capabilities

Highlights

Pros

Considerations

Managed products teams compare with

Azure Machine Learning

Vertex AI

H2O Driverless AI

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions