LiteLLM

Unified gateway for all LLM APIs with OpenAI compatibility

LiteLLM provides a single Python interface to call dozens of LLM providers—OpenAI, Azure, Anthropic, Bedrock, HuggingFace, and more—using the familiar OpenAI request/response format.

Overview

LiteLLM is designed for developers, data scientists, and enterprises that need to integrate multiple large language model (LLM) services without rewriting code for each vendor. By exposing a consistent OpenAI‑style API, it lets you swap providers or run parallel experiments with a single function call.

Core capabilities

The library translates inputs to each provider’s completion, embedding, and image‑generation endpoints, guarantees a uniform choices[0].message.content response shape, and includes built‑in retry, fallback, and routing logic. It supports async calls, streaming token‑by‑token output, and configurable budgets, rate limits, and per‑project isolation. Observability callbacks can forward logs to Lunary, MLflow, Langfuse, Helicone, and other platforms.

Deployment options

Install via pip install litellm or run the official Docker image with the -stable tag for production‑grade load‑tested containers. Set provider API keys as environment variables and optionally deploy the LiteLLM proxy server for multi‑tenant routing, hosted preview, or enterprise‑managed services.

Highlights

Single OpenAI‑compatible API for over 20 LLM providers

Built‑in retry, fallback, and routing across multiple deployments

Streaming and async support for real‑time applications

Observability callbacks to Lunary, MLflow, Langfuse, Helicone, and more

Pros

Reduces code duplication when switching between providers
Consistent response schema simplifies downstream processing
Supports budgeting, rate limiting, and multi‑project isolation
Extensible logging integrates with popular observability platforms

Considerations

Adds an abstraction layer that may introduce slight latency
Requires keeping provider API keys as environment variables
Feature set depends on provider‑specific capabilities; not all endpoints are fully mapped
Complex routing rules may need additional configuration

Managed products teams compare with

When teams consider LiteLLM, these hosted platforms usually appear on the same shortlist.

Eden AI

Unified API aggregator for AI services across providers

OpenRouter

One API for 400+ AI models with smart routing and unified billing/BYOK

Vercel AI Gateway

Unified AI gateway for multi-provider routing, caching, rate limits, and observability

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Teams building applications that need to experiment with multiple LLMs
Enterprises requiring spend tracking and rate‑limit enforcement per project
Developers who prefer the OpenAI request format across providers
Projects that need streaming or async responses for chat‑like interfaces

Not ideal when

Simple scripts that only ever call a single provider
Environments where adding another Python dependency is undesirable
Use cases demanding ultra‑low latency without any abstraction overhead
Scenarios requiring provider‑specific features not yet exposed through LiteLLM

How teams use it

Multi‑model A/B testing

Switch between OpenAI, Anthropic, and Cohere models with a single function call, enabling rapid performance comparison.

Enterprise spend monitoring

Set per‑project budgets and rate limits, automatically routing excess traffic to fallback providers.

Real‑time chat application

Leverage async and streaming support to deliver token‑by‑token responses to end users.

Centralized logging for compliance

Send request and response data to Langfuse, Helicone, or MLflow for audit trails and performance analytics.

Tech snapshot

Python87%

TypeScript12%

HTML1%

JavaScript1%

Shell1%

Makefile1%

Frequently asked questions

How do I install LiteLLM?

Use `pip install litellm` or pull the official Docker image with the `-stable` tag.

Which LLM providers are supported?

LiteLLM supports OpenAI, Azure, Anthropic, Bedrock, HuggingFace, TogetherAI, VertexAI, Groq, and many others; see the provider list in the docs.

Can I use LiteLLM for embeddings and image generation?

Yes, the library translates calls to the provider’s completion, embedding, and image_generation endpoints.

How does rate limiting work?

You can configure budgets and rate limits per project, API key, or model through the proxy’s routing settings.

Is there a hosted version?

A preview hosted proxy is available, and an enterprise tier offers managed deployment.

Project at a glance

Active

Visit site View repo

Stars: 38,103
Watchers: 38,103
Forks: 6,219

Repo age2 years old

Last commit24 hours ago

Primary languagePython

Last synced 23 hours ago

Overview

Overview

Core capabilities

Deployment options

Highlights

Pros

Considerations

Managed products teams compare with

Eden AI

OpenRouter

Vercel AI Gateway

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions