Hopsworks logo

Hopsworks

Real-time AI Lakehouse with Python-centric Feature Store

Hopsworks delivers a real-time AI lakehouse, offering a Python-focused feature store, MLOps tools, multi-tenant projects, and flexible deployment on cloud, serverless, or on-premises.

Hopsworks banner

Overview

Overview

Hopsworks is designed for data science and engineering teams that need a unified platform to build, govern, and serve machine‑learning assets. It combines a Python‑centric feature store with full MLOps capabilities, enabling collaborative development across projects while maintaining strict data governance.

Core Capabilities

The platform provides project‑based multi‑tenancy, versioned feature groups, lineage tracking, and integrated tools such as Jupyter notebooks, Conda environments, Airflow pipelines, and GPU‑accelerated training. Users can run Spark, Flink, or streaming jobs and serve models through built‑in APIs or external services like Databricks, SageMaker, and Kubeflow.

Deployment Flexibility

Hopsworks can be consumed as a managed service on AWS, Azure, or GCP, as a serverless app at app.hopsworks.ai, or installed on‑premises using a simple installer on CentOS/RHEL 8.x or Ubuntu 22.04. The on‑premises option supports air‑gapped environments and offers full control over hardware and security policies.

Highlights

Python‑centric Feature Store with versioning, lineage, and governance
Project‑based multi‑tenant environment for team collaboration
Integrated MLOps stack: Airflow, Jupyter, GPU training, model serving
Flexible deployment: managed cloud, serverless app, or on‑premises installer

Pros

  • Unified platform reduces tool fragmentation
  • Strong data governance with fine‑grained access controls
  • Scalable compute supporting Spark, Flink, and GPUs
  • Runs on cloud, serverless, or on‑premises, including air‑gapped sites

Considerations

  • On‑premises install requires minimum 32 GB RAM, 8 CPUs
  • Learning curve for the full feature set
  • Limited to Linux (CentOS/RHEL, Ubuntu) environments
  • Serverless offering is currently in beta

Managed products teams compare with

When teams consider Hopsworks, these hosted platforms usually appear on the same shortlist.

Amazon SageMaker Feature Store logo

Amazon SageMaker Feature Store

Fully managed repository to create, store, share, and serve ML features

Databricks Feature Store logo

Databricks Feature Store

Feature registry with governance, lineage, and MLflow integration

Tecton Feature Store logo

Tecton Feature Store

Central hub to manage, govern, and serve ML features across batch, streaming, and real time

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Data science teams needing a collaborative feature store
  • Enterprises requiring governed ML assets across projects
  • Organizations that must run ML workloads on‑premises for compliance
  • Companies integrating with AWS, Azure, or GCP

Not ideal when

  • Small hobby projects with minimal resource needs
  • Teams that only use Windows environments
  • Users seeking a fully managed solution without any setup effort
  • Projects unable to meet the minimum hardware specifications

How teams use it

Fraud detection with real-time scoring

Features are ingested and served instantly, enabling online models to flag fraudulent transactions as they occur.

Customer churn prediction batch pipeline

Batch feature pipelines generate nightly datasets, train models, and store predictions for downstream analytics.

Cross-team model governance

Projects isolate development, staging, and production, providing versioned models and audit trails for regulatory compliance.

Hybrid cloud training

Data engineers run Spark jobs on on-premises clusters while scaling GPU training in the cloud via managed Hopsworks.

Tech snapshot

Java81%
Ruby18%
Jupyter Notebook1%
Python1%
Shell1%

Tags

mlmlopsmodel-servingawshopsworksmachine-learningpysparkpythonfeature-managementgcpfeature-storegovernanceazurekserveserverlessdata-sciencefeature-engineering

Frequently asked questions

What operating systems are supported for on-premises installation?

CentOS/RHEL 8.x and Ubuntu 22.04 are officially supported.

Can I use Hopsworks without a cloud provider?

Yes, the installer lets you run Hopsworks on any compatible Linux VM or bare-metal server.

How does the feature store ensure data governance?

It provides project-based multi-tenancy, fine-grained permissions, versioning, lineage, and provenance for all ML assets.

Is there a free tier for the serverless app?

The serverless app at app.hopsworks.ai is publicly accessible and can be used for tutorials and small experiments.

What integrations are available for model serving?

Hopsworks integrates with Databricks, SageMaker, and Kubeflow, and offers its own MLOps APIs for model registry and deployment.

Project at a glance

Stable
Stars
1,280
Watchers
1,280
Forks
153
LicenseAGPL-3.0
Repo age7 years old
Last commit11 months ago
Self-hostingSupported
Primary languageJava

Last synced 3 hours ago