
Amazon SageMaker JumpStart
ML hub with curated foundation models, pretrained algorithms, and solution templates you can deploy and fine-tune in SageMaker
Discover top open-source software, updated regularly with real-world adoption signals.

Zero-code fine-tuning platform for diverse large language models
LLaMA Factory lets you fine-tune over 100 LLMs via a CLI or web UI, supporting LoRA, QLoRA, advanced optimizers, multimodal data, and OpenAI-style API deployment without writing code.

LLaMA Factory is a zero‑code platform that enables developers, researchers, and ML engineers to fine‑tune more than a hundred large language models—including LLaMA, Mistral, Qwen, Gemma, and multimodal variants—through a simple CLI or a web‑based GUI. The system bundles a wide range of training approaches such as LoRA, QLoRA (2‑8‑bit), full‑parameter tuning, and advanced optimizers like GaLore, OFT, and DoRA, allowing users to experiment with supervised, reward‑modeling, and PPO/DPO pipelines without writing custom scripts.
Trained models can be exported to an OpenAI‑compatible endpoint powered by vLLM or SGLang, or served locally via Gradio. The framework supports Docker builds, cloud‑GPU deals (e.g., Alaya NeW), and integrates with experiment trackers such as TensorBoard, Wandb, MLflow, and SwanLab. Built‑in tricks like FlashAttention‑2 and Liger Kernel further accelerate training, while the extensive logging and monitoring tools keep the workflow transparent from data preparation to inference.
When teams consider LLaMA-Factory, these hosted platforms usually appear on the same shortlist.

ML hub with curated foundation models, pretrained algorithms, and solution templates you can deploy and fine-tune in SageMaker

Enterprise AI platform providing LLMs (Command, Aya) plus Embed/Rerank for retrieval

API-first platform to run, fine-tune, and deploy AI models without managing infrastructure
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Domain-specific chatbot for mental health support
Fine-tuned a LLaMA-3 model on curated counseling data, deployed via OpenAI-compatible API, delivering empathetic responses in production.
Visual document extraction for banking
Trained Qwen2-VL on scanned forms, enabling image understanding and data extraction through a Gradio UI.
Reinforcement learning for code generation
Applied PPO and DPO on CodeLlama using the built-in RL pipeline, improving generation quality measured by automated tests.
Low-memory fine-tuning on consumer GPUs
Utilized 4-bit QLoRA with LoRA+ to fine-tune a 7B model on a single RTX 3060, achieving comparable performance to full-precision training.
No, the platform provides a zero-code CLI and a web UI that handle data preparation, training, and deployment without writing Python scripts.
GPU memory is the main constraint; quantization (2‑8-bit) and LoRA allow training 7‑30B models on 16‑32 GB GPUs, while larger models need multi-GPU or cloud resources.
LLaMA Factory integrates with TensorBoard, Wandb, MLflow, and SwanLab, and also offers the LlamaBoard dashboard for real-time metrics.
Yes, the tool can launch an OpenAI-style endpoint using vLLM or SGLang workers, and also provides a Gradio UI for quick testing.
The framework includes supervised fine-tuning for image, video, and audio inputs, with models like LLaVA, Qwen2-VL, and InternVL3.
Project at a glance
ActiveLast synced 4 days ago