Hopsworks

Real-time AI Lakehouse with Python-centric Feature Store

Hopsworks delivers a real-time AI lakehouse, offering a Python-focused feature store, MLOps tools, multi-tenant projects, and flexible deployment on cloud, serverless, or on-premises.

Overview

Hopsworks is designed for data science and engineering teams that need a unified platform to build, govern, and serve machine‑learning assets. It combines a Python‑centric feature store with full MLOps capabilities, enabling collaborative development across projects while maintaining strict data governance.

Core Capabilities

The platform provides project‑based multi‑tenancy, versioned feature groups, lineage tracking, and integrated tools such as Jupyter notebooks, Conda environments, Airflow pipelines, and GPU‑accelerated training. Users can run Spark, Flink, or streaming jobs and serve models through built‑in APIs or external services like Databricks, SageMaker, and Kubeflow.

Deployment Flexibility

Hopsworks can be consumed as a managed service on AWS, Azure, or GCP, as a serverless app at app.hopsworks.ai, or installed on‑premises using a simple installer on CentOS/RHEL 8.x or Ubuntu 22.04. The on‑premises option supports air‑gapped environments and offers full control over hardware and security policies.

Highlights

Python‑centric Feature Store with versioning, lineage, and governance

Project‑based multi‑tenant environment for team collaboration

Integrated MLOps stack: Airflow, Jupyter, GPU training, model serving

Flexible deployment: managed cloud, serverless app, or on‑premises installer

Pros

Unified platform reduces tool fragmentation
Strong data governance with fine‑grained access controls
Scalable compute supporting Spark, Flink, and GPUs
Runs on cloud, serverless, or on‑premises, including air‑gapped sites

Considerations

On‑premises install requires minimum 32 GB RAM, 8 CPUs
Learning curve for the full feature set
Limited to Linux (CentOS/RHEL, Ubuntu) environments
Serverless offering is currently in beta

Managed products teams compare with

When teams consider Hopsworks, these hosted platforms usually appear on the same shortlist.

Amazon SageMaker Feature Store

Fully managed repository to create, store, share, and serve ML features

Databricks Feature Store

Feature registry with governance, lineage, and MLflow integration

Tecton Feature Store

Central hub to manage, govern, and serve ML features across batch, streaming, and real time

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Data science teams needing a collaborative feature store
Enterprises requiring governed ML assets across projects
Organizations that must run ML workloads on‑premises for compliance
Companies integrating with AWS, Azure, or GCP

Not ideal when

Small hobby projects with minimal resource needs
Teams that only use Windows environments
Users seeking a fully managed solution without any setup effort
Projects unable to meet the minimum hardware specifications

How teams use it

Fraud detection with real-time scoring

Features are ingested and served instantly, enabling online models to flag fraudulent transactions as they occur.

Customer churn prediction batch pipeline

Batch feature pipelines generate nightly datasets, train models, and store predictions for downstream analytics.

Cross-team model governance

Projects isolate development, staging, and production, providing versioned models and audit trails for regulatory compliance.

Hybrid cloud training

Data engineers run Spark jobs on on-premises clusters while scaling GPU training in the cloud via managed Hopsworks.

Tech snapshot

Java81%

Ruby18%

Jupyter Notebook1%

Python1%

Shell1%

Frequently asked questions

What operating systems are supported for on-premises installation?

CentOS/RHEL 8.x and Ubuntu 22.04 are officially supported.

Can I use Hopsworks without a cloud provider?

Yes, the installer lets you run Hopsworks on any compatible Linux VM or bare-metal server.

How does the feature store ensure data governance?

It provides project-based multi-tenancy, fine-grained permissions, versioning, lineage, and provenance for all ML assets.

Is there a free tier for the serverless app?

The serverless app at app.hopsworks.ai is publicly accessible and can be used for tutorials and small experiments.

What integrations are available for model serving?

Hopsworks integrates with Databricks, SageMaker, and Kubeflow, and offers its own MLOps APIs for model registry and deployment.

Project at a glance

Dormant

Visit site View repo

Stars: 1,287
Watchers: 1,287
Forks: 154

LicenseAGPL-3.0

Repo age7 years old

Last commitlast year

Self-hostingSupported

Primary languageJava

Last synced 17 minutes ago

Overview

Overview

Core Capabilities

Deployment Flexibility

Highlights

Pros

Considerations

Managed products teams compare with

Amazon SageMaker Feature Store

Databricks Feature Store

Tecton Feature Store

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions