
Amazon SageMaker
Fully managed machine learning service to build, train, and deploy ML models at scale
Discover top open-source software, updated regularly with real-world adoption signals.

Unified GPU cluster manager for scalable AI inference
GPUStack orchestrates heterogeneous GPU resources across Linux, macOS, and Windows, delivering OpenAI‑compatible APIs for LLMs, VLMs, diffusion, audio, and embedding models.

GPUStack is a lightweight manager that unifies GPUs from NVIDIA, Apple Metal, AMD ROCm, Ascend CANN, and other accelerators into a single inference platform. It supports a broad catalog of models—including large language, vision‑language, diffusion, audio, and embedding models—through flexible backends such as vLLM, llama‑box, Ascend MindIE, and vox‑box. Users interact via a web UI or OpenAI‑compatible endpoints, benefiting from automatic resource evaluation, load balancing, and real‑time GPU monitoring.
Deployments are container‑based on Linux (Docker with NVIDIA Container Toolkit) and available as desktop installers for macOS and Windows. Adding nodes or GPUs scales the cluster instantly, while multi‑version backend support lets different models run with their optimal runtimes. API keys and user management secure access, making GPUStack suitable for internal AI services, multi‑tenant SaaS, or research clusters.
When teams consider GPUStack, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Internal chatbot powered by LLMs
Deploys Qwen3 or LLaMA models behind OpenAI‑compatible APIs for secure, low‑latency employee assistance.
On‑prem image generation service
Runs Stable Diffusion or FLUX across multiple GPUs, delivering high‑throughput image creation for design teams.
Speech‑to‑text transcription pipeline
Hosts Whisper models, exposing transcription endpoints that scale with added GPU nodes.
Multi‑tenant SaaS inference platform
Provides isolated API keys and load‑balanced inference for diverse customer models on shared GPU clusters.
Install Docker and the NVIDIA Container Toolkit, then run the provided `docker run` command to start the GPUStack server.
GPUStack supports NVIDIA CUDA, Apple Metal, AMD ROCm, Ascend CANN, Hygon DTK, Moore Threads MUSA, Iluvatar Corex, and Cambricon MLU.
Yes, you can deploy models from Hugging Face, ModelScope, or a local file path by following the UI deployment workflow.
GPUStack focuses on inference; training workflows are not provided out of the box.
API keys are generated per user, displayed only once, and can be managed via the UI's API Keys page.
Project at a glance
ActiveLast synced 4 days ago