TensorZero

Unified, high-performance gateway for industrial-grade LLM applications

MLOps: Experiment Tracking & Model Registry

TensorZero provides a fast, extensible gateway, observability, optimization, evaluation, and experimentation stack for LLMs, supporting dozens of providers, streaming, multimodal, and high-throughput workloads.

Overview

TensorZero is a modular stack that lets developers access any major LLM provider through a single, high-performance gateway. Built in Rust, the gateway adds less than 1 ms p99 overhead and can sustain over 10 k queries per second, while supporting streaming, tool use, batch, embeddings, multimodal inputs, and caching.

Observability & Optimization

All inferences and optional feedback are stored in a user‑provided database (e.g., ClickHouse) and can be inspected via the TensorZero UI or programmatically. The platform automatically builds datasets, replays historic calls with new prompts or models, and exports OpenTelemetry traces. Integrated metrics and human‑feedback loops enable prompt, model, and strategy optimization.

Experimentation & Deployment

TensorZero includes out‑of‑the‑box A/B testing, routing, retries, fallbacks, and granular rate‑limiting. It can be deployed with Docker, accessed via a Python client, patched OpenAI SDKs, or any HTTP client, making it language‑agnostic. Teams can adopt individual components incrementally and combine them with existing tooling.

Highlights

Unified API accesses 30+ LLM providers with a single client

Sub‑millisecond overhead enables >10k QPS at scale

Built‑in observability stores inferences and feedback with UI and OpenTelemetry export

Experimentation layer offers A/B testing, routing, retries, and fallback strategies out of the box

Pros

High performance low latency suitable for production workloads
Broad provider support including major cloud and self‑hosted models
Language‑agnostic access via HTTP, Python, and OpenAI SDK patches
Integrated observability and experimentation tools reduce third‑party dependencies

Considerations

Self‑hosting required; users must manage Docker and a database like ClickHouse
Configuration can be complex for small or quick‑start projects
Advanced features such as spend tracking are not yet implemented
Custom OpenAI‑compatible integrations may need additional setup

Managed products teams compare with

When teams consider TensorZero, these hosted platforms usually appear on the same shortlist.

Comet

Experiment tracking, model registry & production monitoring for ML teams

DagsHub

Git/DVC-based platform with MLflow experiment tracking and model registry.

Neptune

Experiment tracking and model registry to log, compare, and manage ML runs.

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Teams building production LLM services needing consistent multi‑provider access
Applications with strict latency or throughput requirements
Organizations that want full control over data, logging, and feedback loops
Developers who prefer a single stack for inference, monitoring, and experimentation

Not ideal when

Hobby projects that only need a single provider and minimal setup
Environments where managed SaaS gateways are preferred over self‑hosting
Teams lacking ops resources to maintain Docker/ClickHouse deployments
Use cases that require built‑in spend tracking or billing features (not yet available)

How teams use it

Real‑time chat assistant with multi‑model fallback

Seamlessly route requests between OpenAI and Anthropic, maintaining sub‑millisecond latency and automatic retries on failures.

Batch embedding generation for recommendation engine

Process millions of texts via the gateway’s batch endpoint, store embeddings in ClickHouse, and monitor throughput via the UI.

A/B testing new prompt designs

Deploy two prompt variants, collect user feedback, and use built‑in metrics to identify the higher‑performing version.

Debugging and replaying production inferences

Query historical calls from the UI, edit prompts, and re‑run them to evaluate model updates without affecting live traffic.

Tech snapshot

Rust75%

TypeScript17%

Python6%

Jupyter Notebook1%

Shell1%

Go1%

Frequently asked questions

How do I add a new LLM provider?

Add the provider in the TensorZero configuration; any OpenAI‑compatible endpoint can be registered, and many major providers are pre‑supported.

What storage backend is used for observability?

You configure your own database (e.g., ClickHouse) where inferences, metrics, and feedback are persisted.

Can I use TensorZero with existing OpenAI SDK code?

Yes, you can patch the OpenAI client or point the SDK to the gateway’s base URL to route calls through TensorZero.

How does rate limiting work?

Custom rate limits can be defined with granular scopes such as user tags, and the gateway enforces them per request.

Is there a managed hosting option?

Currently TensorZero is self‑hosted via Docker; no managed SaaS offering is provided.

Project at a glance

Active

Visit site View repo

Stars: 11,046
Watchers: 11,046
Forks: 776

LicenseApache-2.0

Repo age1 year old

Last commityesterday

Primary languageRust

Last synced yesterday

Overview

Overview

Observability & Optimization

Experimentation & Deployment

Highlights

Pros

Considerations

Managed products teams compare with

Comet

DagsHub

Neptune

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions