CloudQuery logo

CloudQuery

High-performance ELT framework powered by Apache Arrow

CloudQuery is a composable data movement framework that extracts from cloud infrastructure and SaaS APIs to any destination, running entirely on your infrastructure.

CloudQuery banner

Overview

What is CloudQuery?

CloudQuery is a high-performance data movement framework designed for developers who need complete control over their data pipelines. Built on Apache Arrow, it extracts data from cloud infrastructure, SaaS platforms, and APIs, delivering it to any destination—all while running entirely on your infrastructure.

Who Uses CloudQuery?

Engineering and security teams leverage CloudQuery for cloud security posture management (CSPM), asset inventory, FinOps, and attack surface management. Data engineers use it as a flexible ELT platform to eliminate data silos across security, infrastructure, marketing, and finance teams.

Key Capabilities

The framework offers a code-first, extensible plugin system with no vendor lock-in. Its composable architecture integrates with your existing languages, destinations, and orchestrators. Specialized plugins provide first-class support for complex data sources including AWS, GCP, Azure, and hundreds of other integrations. Because your data never touches external servers, CloudQuery fits regulated, secure, and performance-critical environments where privacy is paramount.

Built in Go and distributed under MPL-2.0, CloudQuery combines the flexibility of open-source tooling with enterprise-grade performance for large-scale data movement.

Highlights

Apache Arrow-powered engine for high-performance data movement at scale
Runs entirely on your infrastructure with zero data egress to external servers
Extensible plugin system with hundreds of source and destination integrations
Code-first architecture with multi-language SDK support and no vendor lock-in

Pros

  • Complete data privacy with on-premises execution model
  • Specialized plugins for cloud infrastructure, security, and FinOps data
  • Composable design integrates with existing tools and orchestrators
  • High performance for large-scale data movement using Apache Arrow

Considerations

  • Requires managing your own infrastructure and orchestration
  • Code-first approach may have steeper learning curve than GUI-based tools
  • Self-hosted model means you handle scaling and maintenance
  • Plugin ecosystem maturity varies across different integrations

Managed products teams compare with

When teams consider CloudQuery, these hosted platforms usually appear on the same shortlist.

Airbyte logo

Airbyte

Open-source data integration engine for ELT pipelines across data sources

Azure Data Factory logo

Azure Data Factory

Cloud-based data integration service to create, schedule, and orchestrate ETL/ELT data pipelines at scale

Fivetran logo

Fivetran

Managed ELT data pipelines into warehouses

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Security teams needing CSPM or attack surface management across multi-cloud
  • Data engineers building custom ELT pipelines with strict privacy requirements
  • FinOps teams consolidating billing data from multiple cloud providers
  • Organizations requiring on-premises data movement for compliance

Not ideal when

  • Teams seeking fully managed SaaS solutions without infrastructure overhead
  • Non-technical users preferring point-and-click configuration interfaces
  • Small projects where lightweight scripts suffice
  • Organizations unable to self-host and maintain data infrastructure

How teams use it

Cloud Security Posture Management

Monitor and enforce security policies across AWS, GCP, and Azure infrastructure with continuous compliance scanning and unified visibility.

Multi-Cloud Asset Inventory

Collect and centralize cloud configuration data from all major providers into a single queryable database for governance and auditing.

Cloud FinOps Optimization

Unify billing data across cloud providers to identify cost-saving opportunities and track spending trends in real time.

AI Model Data Pipelines

Feed LLM pipelines and AI applications with high-volume data from diverse sources using Apache Arrow's efficient columnar format.

Tech snapshot

Go91%
Python3%
Makefile2%
TypeScript2%
Java1%
Smarty1%

Tags

kubernetesawsgodata-collectionbigqueryetldata-analysisgooglecspmsqleltattack-surface-managementgcpdata-engineeringetl-frameworkairbyteazuredatagithub-apidata-integration

Frequently asked questions

Does CloudQuery store or process my data on external servers?

No. CloudQuery runs entirely on your infrastructure. Your data never touches CloudQuery's servers, ensuring complete privacy and compliance with data residency requirements.

What data sources and destinations does CloudQuery support?

CloudQuery supports hundreds of integrations including AWS, GCP, Azure, Kubernetes, GitHub, and many SaaS platforms. Destinations include PostgreSQL, BigQuery, Snowflake, S3, and more. Check the integrations hub for the full list.

How does CloudQuery compare to Airbyte or Fivetran?

CloudQuery is code-first and optimized for cloud infrastructure and security data, running on your infrastructure. It excels at CSPM, asset inventory, and FinOps use cases with specialized plugins, while Airbyte and Fivetran focus more on SaaS-to-warehouse replication.

Can I build custom plugins for proprietary data sources?

Yes. CloudQuery provides an open plugin SDK supporting multiple languages. You can develop, extend, and ship custom plugins without vendor approval or lock-in.

What license does CloudQuery use?

CloudQuery framework, CLI, SDK, and some integrations are licensed under MPL-2.0, allowing commercial use with specific copyleft requirements for modifications.

Project at a glance

Active
Stars
6,308
Watchers
6,308
Forks
547
LicenseMPL-2.0
Repo age5 years old
Last commit2 days ago
Self-hostingSupported
Primary languageGo

Last synced yesterday