
TensorRT LLM
Accelerated LLM inference with NVIDIA TensorRT optimizations
- Stars
- 12,694
- License
- —
- Last commit
- 9 hours ago
Explore curated open-source tools in the Model Serving & Inference Platforms category. Compare technologies, see alternatives, and find the right solution for your workflow.
10+ projects · Page 1 of 1

Accelerated LLM inference with NVIDIA TensorRT optimizations

Scale Python and AI workloads from laptop to cluster effortlessly

Fast, lightweight Python framework for scalable LLM inference

High‑performance serving framework for LLMs and vision‑language models.

Unified AI model serving across clouds, edge, and GPUs

Unified AI inference platform for generative and predictive workloads on Kubernetes

Deploy modular, data-centric AI applications at scale on Kubernetes

Unified Python framework for building high‑performance AI inference APIs

High‑throughput LLM serving with intra‑device parallelism and asynchronous CPU scheduling

Unified ML library for scalable training, serving, and federated learning.