
Hiveflow
Visual workflow orchestration for AI agents and automation
Discover top open-source software, updated regularly with real-world adoption signals.

RAG engine with deep document understanding and agents
RAGFlow is an open-source RAG engine combining retrieval-augmented generation with agent capabilities, offering deep document understanding, template-based chunking, and automated workflows for production AI systems.

RAGFlow is a Retrieval-Augmented Generation (RAG) engine that fuses advanced RAG techniques with agent capabilities to create a superior context layer for large language models. It offers a streamlined RAG workflow adaptable to enterprises of any scale, powered by a converged context engine and pre-built agent templates.
The platform excels at deep document understanding, extracting knowledge from unstructured data with complicated formats including Word, slides, Excel, images, scanned copies, structured data, and web pages. Its template-based chunking approach provides intelligent, explainable text segmentation with multiple template options. RAGFlow minimizes hallucinations through grounded citations, visualization of text chunking for human intervention, and traceable references.
RAGFlow requires Docker and runs on systems with at least 4 CPU cores, 16 GB RAM, and 50 GB disk space. The platform supports both CPU and GPU acceleration for embedding and document processing tasks. It offers configurable LLMs and embedding models, multiple recall paired with fused re-ranking, and intuitive APIs for seamless business integration. Recent updates include support for GPT-5, agentic workflows, MCP, cross-language queries, and multi-modal image understanding.
When teams consider RAGFlow, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Enterprise Document Intelligence
Extract structured knowledge from mixed-format corporate documents, scanned files, and presentations with grounded citations for compliance and audit trails
Multi-Modal Research Assistant
Process PDFs with embedded images using multi-modal models, combine with internet search for deep research capabilities across unlimited document tokens
Text-to-SQL Analytics
Transform natural language queries into SQL statements through RAG, enabling business users to query structured databases without technical expertise
Cross-Language Knowledge Retrieval
Query documents in multiple languages with automatic translation and context preservation, supporting global teams and multilingual content repositories
RAGFlow requires at least 4 CPU cores, 16 GB RAM, 50 GB disk space, Docker 24.0.0+, and Docker Compose v2.26.1+. GPU acceleration is optional for embedding and document processing tasks.
Full images (~9 GB) include pre-built embedding models for immediate use. Slim images (~2 GB) exclude embedding models, requiring separate configuration but offering faster downloads and smaller footprint.
Docker images are built for x86 platforms only. ARM64 users must build custom Docker images following the provided guide in the repository documentation.
RAGFlow provides visualization of text chunking for human intervention, displays key references with traceable citations, and uses grounded retrieval to ensure answers are backed by source documents.
RAGFlow supports Word, PowerPoint, Excel, TXT, images, scanned copies, structured data, web pages, PDFs, and DOCX files with embedded images processed through multi-modal models.
Project at a glance
ActiveLast synced 4 days ago