Phoenix

AI observability platform for tracing, evaluation, and prompt management

Phoenix lets you trace LLM calls, benchmark performance, version datasets, run experiments, and manage prompts—all vendor‑agnostic and deployable locally, in containers, or in the cloud.

Overview

Phoenix is an AI observability platform that centralizes tracing, evaluation, dataset versioning, experiment tracking, and prompt management for LLM‑driven applications. It targets ML engineers, data scientists, and product teams who need reproducible experimentation and deep insight into model behavior.

Capabilities & Deployment

Built on OpenTelemetry and the OpenInference ecosystem, Phoenix automatically instruments popular frameworks such as LlamaIndex, LangChain, Haystack, and DSPy, while supporting a wide range of LLM providers (OpenAI, Bedrock, MistralAI, VertexAI, LiteLLM, Google GenAI, etc.). The platform can run on a local machine, within a Jupyter notebook, as a Docker container, or at scale on Kubernetes, and a hosted cloud instance is also available. Python and TypeScript SDKs provide lightweight clients and evaluation libraries, enabling seamless integration into existing pipelines.

Why Choose Phoenix

By offering a vendor‑agnostic, extensible observability stack, Phoenix helps teams iterate faster, compare model variants, and debug production issues without locking into a single provider or framework.

Highlights

Unified tracing via OpenTelemetry

LLM‑specific evaluation suite

Versioned dataset and experiment tracking

Prompt management with version control

Pros

Vendor‑agnostic across frameworks and providers
Extensible Python and TypeScript SDKs
Runs anywhere from notebooks to Kubernetes
Rich UI for playground and experiment comparison

Considerations

Requires instrumentation of your code
Full platform may need container/K8s setup for scaling
Feature set still evolving (e.g., TypeScript evals in alpha)
Learning curve for OpenTelemetry concepts

Managed products teams compare with

When teams consider Phoenix, these hosted platforms usually appear on the same shortlist.

Confident AI

DeepEval-powered LLM evaluation platform to test, benchmark, and safeguard apps

InsightFinder

AIOps platform for streaming anomaly detection, root cause analysis, and incident prediction

LangSmith Observability

LLM/agent observability with tracing, monitoring, and alerts

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

ML engineers building LLM applications needing observability
Data scientists experimenting with prompts and models
Teams that want reproducible evaluation pipelines
Organizations preferring self‑hosted AI monitoring

Not ideal when

Simple scripts with no tracing needs
Projects locked to a single LLM provider
Teams without capacity to manage container/K8s deployments
Use cases requiring real‑time alerting out of the box

How teams use it

Prompt Optimization

Iteratively test prompt variations, compare model responses, and select the best performing version.

RAG Performance Benchmarking

Run retrieval and answer relevance evaluations to quantify improvements across index updates.

Dataset Versioning for Fine‑Tuning

Create immutable snapshots of training examples, track changes, and feed them into fine‑tuning pipelines.

End‑to‑End LLM Debugging

Trace runtime calls, view input/output spans, and replay failures directly in the Playground.

Tech snapshot

Jupyter Notebook56%

Python28%

TypeScript16%

Shell1%

JavaScript1%

PLpgSQL1%

Frequently asked questions

Do I need to run a Phoenix server?

You can use the hosted instance at app.phoenix.arize.com or self‑host via Docker/Kubernetes.

Which languages are supported?

Core SDKs are available for Python and TypeScript; tracing works for any language that can emit OpenTelemetry spans.

How does Phoenix integrate with existing LLM frameworks?

Instrumentation packages are provided for LlamaIndex, LangChain, Haystack, DSPy, and others via OpenInference.

Is there a cost to use Phoenix?

The platform is open source and free; cloud hosting by Arize AI is a paid service.

Can I export evaluation results?

Results can be accessed through the REST API or client libraries and exported to CSV/JSON for downstream analysis.

Project at a glance

Active

Visit site View repo

Stars: 8,772
Watchers: 8,772
Forks: 738

Repo age3 years old

Last commit5 hours ago

Primary languageJupyter Notebook

Last synced 5 hours ago

Overview

Overview

Capabilities & Deployment

Why Choose Phoenix

Highlights

Pros

Considerations

Managed products teams compare with

Confident AI

InsightFinder

LangSmith Observability

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions