MLJAR logo

MLJAR

Automated, transparent machine learning for tabular data in minutes

mljar-supervised automates preprocessing, model selection, hyper‑parameter tuning, and reporting for tabular datasets, delivering transparent pipelines and visual explanations in minutes.

MLJAR banner

Overview

Overview

mljar-supervised is a Python package that streamlines the end‑to‑end workflow for tabular machine‑learning projects. It targets data scientists, analysts, and developers who need fast baselines, thorough model comparisons, and clear documentation without writing extensive boilerplate code.

Capabilities

The library offers four built‑in modes—Explain, Perform, Compete, and Optuna—each tuned for different goals such as data exploration, production‑ready pipelines, competition‑level performance, or exhaustive hyper‑parameter search. It automatically handles missing values, categorical encoding, and advanced feature engineering (e.g., golden features, text and time transforms). A wide algorithm suite (Linear, Decision Tree, Random Forest, LightGBM, XGBoost, CatBoost, Neural Networks, etc.) is combined with greedy ensembling and optional stacking. Every run generates a detailed Markdown report with learning curves, feature importance, SHAP visualizations, and model metrics, enabling reproducibility and auditability. The optional web‑app provides a code‑free GUI for secure local execution.

Highlights

Four purpose‑driven modes (Explain, Perform, Compete, Optuna)
Automatic preprocessing, feature engineering, and hyper‑parameter tuning
Greedy ensembling and optional stacking for top performance
Comprehensive Markdown reports with visual explanations

Pros

  • Speeds up model development for tabular data
  • Provides transparent pipelines and detailed documentation
  • Built‑in explainability with SHAP and decision‑tree visualizations
  • Supports a broad range of algorithms and ensembling

Considerations

  • Focused on tabular data; not suitable for image or audio tasks
  • Optuna mode can be computationally intensive for large datasets
  • Requires Python environment and compatible libraries (e.g., LightGBM)
  • Advanced feature engineering may increase runtime on very large data

Managed products teams compare with

When teams consider MLJAR, these hosted platforms usually appear on the same shortlist.

Azure Machine Learning logo

Azure Machine Learning

Cloud service for accelerating and managing the machine learning project lifecycle, including training and deployment of models

H2O Driverless AI logo

H2O Driverless AI

Automated machine learning platform for building AI models without coding

Vertex AI logo

Vertex AI

Unified ML platform for training, tuning, and deploying models

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Rapid prototyping of predictive models on structured data
  • Regulatory or business audits that need model explainability
  • Machine‑learning competition participants seeking stacked ensembles
  • Teams that want automated, reproducible reporting for recurring analyses

Not ideal when

  • Computer‑vision or natural‑language processing projects
  • Real‑time streaming inference with strict latency constraints
  • Datasets that exceed available memory without out‑of‑core support
  • Users requiring custom deep‑learning architectures beyond provided models

How teams use it

Quick baseline generation

Produces a set of candidate models with performance metrics and a ready‑to‑use report within minutes.

Explainability audit for compliance

Delivers SHAP plots, decision‑tree visualizations, and feature‑importance charts to satisfy regulatory review.

ML competition entry

Creates a stacked ensemble with cross‑validated scores, maximizing leaderboard performance.

Automated monthly analysis

Generates reproducible Markdown reports for each run, enabling consistent documentation across cycles.

Tech snapshot

Python100%

Tags

random-forestautomlensemblehyper-parameterslightgbmmachine-learninghyperparameter-optimizationscikit-learncatboostautoml-pythonautoml-apineural-networkdecision-treexgboostautomated-machine-learningdata-sciencemljarfeature-engineering

Frequently asked questions

What types of data does mljar-supervised support?

It works with tabular datasets containing numeric, categorical, text, and time‑series features.

How do I install the package?

Run `pip install mljar-supervised` in your Python environment.

Can I use the library without an internet connection?

Yes, all training and reporting can be performed locally; the optional web UI runs on your machine.

What are the available AutoML modes?

Explain, Perform, Compete, and Optuna, each optimized for exploration, production, competition, or exhaustive tuning.

How are models evaluated?

Depending on the mode, the library uses train/test splits or k‑fold cross‑validation and reports metrics such as accuracy, F1, ROC‑AUC, and more.

Project at a glance

Stable
Stars
3,234
Watchers
3,234
Forks
430
LicenseMIT
Repo age7 years old
Last commit7 months ago
Primary languagePython

Last synced 12 hours ago