pgvector logo

pgvector

Vector similarity search integrated directly into PostgreSQL

Add vector columns to PostgreSQL tables and run exact or approximate nearest‑neighbor queries with L2, cosine, inner product, Hamming and more, using familiar SQL.

Overview

Overview

pgvector extends PostgreSQL with a native vector data type, letting you store embeddings directly alongside relational data. You can perform exact nearest‑neighbor searches or create HNSW/IVFFlat indexes for fast approximate queries, choosing from L2, inner product, cosine, L1, Hamming, and Jaccard distances—all through standard SQL.

Who it's for & deployment

The extension is ideal for developers and data teams already using PostgreSQL who need vector search without managing a separate service. Installation works on Linux, macOS, Windows, Docker, Homebrew, APT, Yum, and conda‑forge, and the extension can be enabled per‑database with CREATE EXTENSION vector. Once installed, you create a vector column, insert embeddings, and query with operators like for L2 distance or for cosine similarity. Indexes are built with familiar CREATE INDEX syntax, and you can tune HNSW parameters (m, ef_construction, hnsw.ef_search) to balance recall and latency.

Highlights

Exact and approximate nearest‑neighbor search with configurable indexes (HNSW, IVFFlat).
Supports multiple vector types (float, half‑precision, binary, sparse) up to thousands of dimensions.
Multiple distance metrics: L2, inner product, cosine, L1, Hamming, Jaccard.
Leverages PostgreSQL ACID guarantees, joins, and point‑in‑time recovery.

Pros

  • Native SQL interface eliminates the need for a separate vector service.
  • Benefits from PostgreSQL reliability, tooling, and transaction semantics.
  • Flexible indexing options allow tuning of speed versus recall.
  • Works with any language that has a PostgreSQL client.

Considerations

  • Approximate indexes increase memory consumption.
  • Maximum dimensions are limited per vector type (e.g., 2,000 for float).
  • Tuning HNSW parameters may require expertise.
  • Not a dedicated vector database; very large workloads may hit PostgreSQL limits.

Managed products teams compare with

When teams consider pgvector, these hosted platforms usually appear on the same shortlist.

Pinecone logo

Pinecone

Managed vector database for AI applications

Qdrant logo

Qdrant

Open-source vector database

ZIL

Zilliz

Managed vector database service for AI applications

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Developers already using PostgreSQL who need integrated vector search.
  • Applications requiring ACID transactions with vector data.
  • Projects that want to keep embeddings and relational data in a single database.
  • Teams preferring SQL over specialized vector‑search APIs.

Not ideal when

  • Ultra‑high‑throughput vector workloads exceeding PostgreSQL scaling limits.
  • Scenarios needing billions of vectors with sub‑millisecond latency.
  • Environments that require GPU‑accelerated indexing.
  • Use cases where dedicated vector stores provide superior compression.

How teams use it

Semantic product search

Store product embeddings alongside catalog data and retrieve similar items with a single SQL query.

Recommendation engine for media

Join user interaction tables with embedding vectors to compute nearest neighbors for personalized suggestions.

Anomaly detection on time‑series embeddings

Insert vectorized features and query for outliers using distance thresholds directly in PostgreSQL.

Geospatial‑like similarity for text documents

Use cosine distance on sentence embeddings to find related documents without external services.

Tech snapshot

C77%
Perl22%
Makefile1%
Dockerfile1%

Tags

approximate-nearest-neighbor-searchnearest-neighbor-search

Frequently asked questions

Do I need a separate service to use pgvector?

No, it is a PostgreSQL extension; queries run via standard client libraries.

Which index types are available?

HNSW for fast approximate search and IVFFlat for a trade‑off between build time and query speed.

Can I store binary or sparse vectors?

Yes, pgvector supports `bit` (binary) and `sparsevec` types with up to 64k dimensions and 1k non‑zero elements respectively.

How do I control recall vs. speed?

Adjust HNSW parameters `m`, `ef_construction`, and the session variable `hnsw.ef_search` to balance accuracy and latency.

Is pgvector compatible with cloud PostgreSQL providers?

Many hosted providers pre‑install the extension; otherwise you can add it via Docker, Homebrew, APT, Yum, or conda‑forge.

Project at a glance

Active
Stars
19,375
Watchers
19,375
Forks
1,034
Repo age4 years old
Last commit6 days ago
Primary languageC

Last synced 11 hours ago