Feathr logo

Feathr

Scalable feature store for unified data and AI engineering

Feathr provides Pythonic APIs to define, register, and share feature transformations across batch, streaming, and online environments, with point-in-time correctness and native cloud integrations for enterprise AI pipelines.

Feathr banner

Overview

Overview

Feathr is a data and AI engineering platform used in production at LinkedIn for over six years and now available as an open‑source project under the LF AI & Data Foundation. It lets data scientists define feature transformations with Pythonic APIs, register them by name, and reuse them across teams, ensuring consistent, point‑in‑time‑correct data for model training and online serving.

Capabilities & Deployment

Feathr supports batch, streaming, and online workloads with built‑in optimizations that can handle billions of rows and petabyte‑scale datasets. Its native integrations with Databricks and Azure Synapse, along with ARM templates and CLI guides, simplify cloud deployment. Users can start quickly with the Feathr Sandbox Docker container, which includes a UI and Jupyter notebooks for hands‑on experimentation, or install the client via pip for local development.

Ecosystem

A built‑in registry and intuitive UI provide feature discovery, lineage tracking, and access control. Rich transformation primitives—including time‑based aggregations, sliding windows, and custom UDFs with PySpark or Spark SQL—enable flexible engineering of complex AI features.

Highlights

Pythonic APIs with native PySpark and Spark SQL UDF support
Point-in-time correct feature computation for training and online serving
Scalable architecture handling billions of rows and petabyte-scale data
Built-in registry and UI for feature discovery, lineage, and access control

Pros

  • Proven in production at LinkedIn for over 6 years
  • Unified API works across batch, streaming, and online use cases
  • Native integrations with Databricks and Azure Synapse simplify cloud deployment
  • Extensible with custom UDFs and rich transformation primitives

Considerations

  • Requires a Spark environment, limiting use with non‑Spark stacks
  • Learning curve for advanced point‑in‑time semantics
  • Enterprise‑grade scaling may need substantial cloud resources
  • Documentation can assume familiarity with LinkedIn‑style data pipelines

Managed products teams compare with

When teams consider Feathr, these hosted platforms usually appear on the same shortlist.

Amazon SageMaker Feature Store logo

Amazon SageMaker Feature Store

Fully managed repository to create, store, share, and serve ML features

Databricks Feature Store logo

Databricks Feature Store

Feature registry with governance, lineage, and MLflow integration

Tecton Feature Store logo

Tecton Feature Store

Central hub to manage, govern, and serve ML features across batch, streaming, and real time

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Data science teams building large‑scale feature pipelines for ML models
  • Enterprises needing reusable, versioned feature definitions across projects
  • Organizations leveraging Azure or Databricks for AI workloads
  • Teams requiring strict data leakage prevention via point‑in‑time joins

Not ideal when

  • Small projects without Spark infrastructure
  • Use cases focused solely on simple data ETL without ML features
  • Environments where a lightweight, non‑distributed feature store is preferred
  • Teams lacking Python or Spark expertise

How teams use it

NYC Taxi fare prediction

Rapidly define, materialize, and serve fare prediction features with point‑in‑time correctness

Fraud detection pipeline

Combine user account and transaction streams into real‑time fraud risk features

Product recommendation system

Generate and serve user‑item interaction features for personalized ranking

Feature embedding for NLP

Create embedding features using transformer models and serve them in online inference

Tech snapshot

Scala47%
Java30%
Python19%
TypeScript3%
Shell1%
Dockerfile1%

Tags

mlopsfeature-metadatafeature-marketplaceapache-sparkmachine-learningfeature-governanceartificial-intelligencefeature-platformfeature-managementdata-qualityfeature-storedata-engineeringazuredata-sciencefeature-engineering

Frequently asked questions

How do I try Feathr locally?

Run the Feathr Sandbox Docker container, which includes UI, Jupyter, and core services, and follow the quickstart notebook.

What languages are supported?

Feathr’s APIs are Pythonic; transformations can be expressed with native PySpark or Spark SQL.

Can Feathr run on cloud platforms?

Yes, it has native integrations with Databricks and Azure Synapse, with deployment guides and ARM templates.

How does Feathr prevent data leakage?

It computes features using point‑in‑time‑correct semantics, ensuring training data only sees information available up to the event timestamp.

Is there a UI for feature management?

Feathr includes a web UI for searching, exploring lineage, and managing access to registered features.

Project at a glance

Dormant
Stars
1,926
Watchers
1,926
Forks
243
LicenseApache-2.0
Repo age3 years old
Last commit2 years ago
Primary languageScala

Last synced 2 days ago