
Amazon SageMaker JumpStart
ML hub with curated foundation models, pretrained algorithms, and solution templates you can deploy and fine-tune in SageMaker
Discover top open-source software, updated regularly with real-world adoption signals.

Accelerate LLM fine‑tuning with up to 2× speed and 70% less VRAM
Train state‑of‑the‑art LLMs, vision and TTS models faster using Unsloth’s optimized kernels, free notebooks, and Docker image, while cutting memory needs dramatically.

Unsloth provides a collection of optimized training kernels, flexible quantization, and ready‑to‑run notebooks that let you fine‑tune large language, vision, and text‑to‑speech models on modest GPUs. By reducing VRAM consumption up to 80% and delivering 1.5‑2.2× speed improvements, it makes state‑of‑the‑art models accessible to researchers, developers, and educators.
Install with a single pip install unsloth on Linux/WSL or via the official unsloth/unsloth Docker image for a reproducible environment. Windows users follow the Windows Guide after installing PyTorch. The catalog offers free notebooks for models such as GPT‑OSS, Qwen3, Gemma, Llama 3.1, Mistral, and Orpheus‑TTS, with one‑click export to GGUF, Ollama, vLLM, or Hugging Face.
Features include Flex Attention, Dynamic 4‑bit quantization, GRPO/GSPO reasoning kernels, and support for ultra‑long context windows (up to 342K tokens). Multi‑GPU scaling is forthcoming, and the library integrates with Blackwell, DGX Spark, and other high‑end GPU platforms.
When teams consider Unsloth, these hosted platforms usually appear on the same shortlist.

ML hub with curated foundation models, pretrained algorithms, and solution templates you can deploy and fine-tune in SageMaker

Enterprise AI platform providing LLMs (Command, Aya) plus Embed/Rerank for retrieval

API-first platform to run, fine-tune, and deploy AI models without managing infrastructure
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Domain‑specific chatbot
Fine‑tune GPT‑OSS 20B on a 14 GB GPU to match baseline quality with half the training time
Image‑text retrieval model
Adapt Qwen‑VL 8B using GRPO, training on 5 GB VRAM and achieving state‑of‑the‑art retrieval scores
Custom voice generation
Create a bespoke TTS voice with Orpheus‑TTS 3B and export to GGUF for edge deployment
Long‑context reasoning
Extend Llama 3.1 8B to 342K token context on an 80 GB GPU for complex reasoning tasks
Install PyTorch first, then run `pip install unsloth`; see the Windows Guide for CUDA compatibility details.
Any NVIDIA GPU supported by PyTorch; RTX 50x, B200, 6000, and DGX Spark are explicitly mentioned.
Unsloth focuses on training speed and memory; inference can benefit from the same quantizations but separate inference tools are recommended.
All notebooks are linked in the repository README and can be launched directly from the Unsloth catalog page.
Yes, the official `unsloth/unsloth` image provides a ready‑to‑run environment; see the Docker Guide.
Project at a glance
ActiveLast synced 4 days ago