Pinecone vs Qdrant vs pgvector — which should I use for RAG?

Use pgvector if you already run Postgres and your corpus is in the thousands-to-low-millions of vectors; it keeps vectors beside your relational data with no new infrastructure. Use Qdrant if you want an open-source, portable engine (~32.8k GitHub stars) with strong native sparse-dense hybrid search, self-hosted or managed. Use Pinecone if you want a fully managed service with near-zero ops and can accept vendor lock-in and a usage-based bill (around $16–$18 per million read units). For most builds the store is not what decides quality — chunking, hybrid search, and reranking are.

Is pgvector good enough for production RAG?

Yes, for a large share of real workloads. pgvector handles similarity search well up to a few million vectors with HNSW indexing, and version 0.8.0 added iterative index scans that fixed its old filtering weakness. The main constraint is the 2,000-dimension index cap on standard vectors — fine for 1,536-dim models like text-embedding-3-small, and solvable for larger ones via the dimensions parameter or halfvec. Its limits show at very high vector counts and throughput, and native hybrid search isn't built in. If you already run Postgres, it's usually the right place to start.

Do I need a dedicated vector database, or can I use Postgres?

You often don't. If you run Postgres, pgvector adds vector search with no new system — fewer things to deploy, secure, back up, and monitor. A dedicated database like Qdrant or Pinecone earns its place when you need tens of millions of vectors, very high query throughput, native sparse-dense hybrid search out of the box, or fully managed scaling. Choose on data volume, hybrid-search needs, and whether you prefer self-hosted or managed — not on benchmark recall numbers you won't feel in production.

Which vector database has the best hybrid search?

Qdrant has the most native hybrid search of the three: it supports dense, sparse, and multi-vector representations in one query and merges them with configurable Reciprocal Rank Fusion or Distribution-Based Score Fusion. Pinecone offers native sparse-dense hybrid in a managed form. pgvector can do hybrid by combining its vector index with Postgres full-text search, but you wire the fusion yourself. For a hybrid-first build I reach for Qdrant.

Pinecone vs Qdrant vs pgvector: Best Vector DB for RAG (2026)

There is no universally best vector database — there is the one that fits your stack and scale. pgvector, Qdrant, and Pinecone all retrieve nearest neighbours well; they differ in where they run, how far they scale, how native their hybrid search and filtering are, and what they cost to operate. I have shipped RAG on all three, and the choice usually comes down to: are you already on Postgres, do you want to self-host, and how much ops time do you have?

How do Pinecone, Qdrant, and pgvector compare at a glance?

Here is how the three line up on the factors that actually decide a production RAG build.

	pgvector	Qdrant	Pinecone
Hosting model	Postgres extension (self-host or any managed PG: Supabase, RDS, Neon)	Open-source; self-host (Docker/K8s) or Qdrant Cloud	Fully managed SaaS only
GitHub stars (2026)	~22k (repo)	~32.8k (repo)	Closed-source (managed)
Scale	Comfortable to a few million vectors; tuning needed beyond	Tens of millions+ with sharding, quantization, on-disk	Designed for very large, high-throughput indexes
Native hybrid search	Dense + Postgres full-text; you wire fusion yourself	Native dense + sparse with built-in RRF/DBSF fusion	Native sparse-dense hybrid, managed
Indexing	HNSW + IVFFlat	HNSW (+ quantization, on-disk)	Proprietary, managed
Cost model	Free if you already run PG; pay for the DB	Free self-hosted; usage-based on Qdrant Cloud	Usage-based managed; recurring line item
Data residency / lock-in	Full control; lives in your DB	Full control; portable across self-host and cloud	Managed-only; migration is real work

If you already run Postgres, start with pgvector — one fewer system to operate beats marginal recall gains until you actually hit a scale or hybrid-search wall. Reach for Qdrant or Pinecone when you do.

When should you use pgvector?

pgvector is the right default when Postgres is already in your stack. It stores embeddings as a column type and adds vector indexes (HNSW or IVFFlat) plus distance operators, so similarity search lives next to the rows it relates to. You filter on real columns, join against your application tables, and keep one backup, one connection pool, one thing to monitor. The project sits around 22k GitHub stars and ships steadily (pgvector repo).

The trade-offs are real but bounded. The big one is the index dimension cap: pgvector indexes standard vector columns up to 2,000 dimensions (pgvector repo). That matters because OpenAI's text-embedding-3-large outputs 3,072 dimensions by default (OpenAI docs) — over the cap. The escape hatches are real: truncate the embedding via the dimensions parameter (Matryoshka), or use halfvec (half-precision), which indexes up to 4,000 dimensions (pgvector repo). text-embedding-3-small at 1,536 dimensions (OpenAI docs) sits comfortably under the cap with no tricks.

Filtering used to be pgvector's weak spot; version 0.8.0 (October 2024) added iterative index scans and better filter cost estimation specifically to fix overfiltering on WHERE-clause queries (PostgreSQL.org). HNSW gives a better speed-recall tradeoff than IVFFlat at the cost of slower builds and more memory (pgvector repo). Use pgvector when:

You already run Postgres (Supabase, RDS, Neon, self-hosted) and want vectors beside your data.
Your corpus is in the thousands-to-low-millions of vectors.
You value operational simplicity and metadata-rich filtering over peak vector throughput.

When should you use Qdrant?

Qdrant is my pick when I want a dedicated, open-source engine that does hybrid search and filtering well out of the box. It is the most-starred of the three at roughly 32.8k GitHub stars (Qdrant repo). It supports dense, sparse, and multi-vector (e.g. ColBERT-style) representations and merges them with configurable fusion — Reciprocal Rank Fusion or Distribution-Based Score Fusion (Qdrant hybrid queries docs). For memory-constrained scale, built-in quantization cuts RAM usage by up to 97% (Qdrant repo). Because it is open source, you run it in Docker locally, self-host on your cluster, or use Qdrant Cloud — and move between them without rewriting your retrieval layer.

On the Indian legal AI platform I built, Qdrant is the vector store behind a Hybrid RAG layer: voyage-law-2 embeddings (a 1,024-dim, legally-tuned model trained on a trillion legal tokens that beats text-embedding-3-large by ~6% on average across legal retrieval benchmarks — Voyage AI) plus BM25 sparse vectors and an LLM reranker. Native sparse-dense fusion made it straightforward to combine semantic recall with exact-term matching — which matters when a wrong or missing authority is a liability, not a cosmetic bug. Use Qdrant when:

You want native hybrid (dense + sparse) search and rich metadata filtering without bolting it together.
You want portability — self-host or managed, no SaaS lock-in, full data residency control.
You expect to grow into tens of millions of vectors and want quantization and on-disk options ready.

When should you use Pinecone?

Pinecone is the right call when you want a fully managed service and the least possible ops. There is no cluster to run, scaling and replication are handled, and it offers native sparse-dense hybrid search and metadata filtering. Its serverless model bills on usage rather than provisioned nodes: roughly $16–$18 per million read units, $4–$4.50 per million write units, and $0.33/GB/month storage, with a Starter free tier of 2 GB storage, 2M write units, and 1M read units, and a Standard plan minimum of $50/month (Pinecone pricing). For teams without infrastructure people — or who simply don't want to spend time on vector-DB ops — that convenience is the whole value proposition.

The costs are vendor lock-in and a recurring bill that scales with query volume. You cannot self-host, and migrating off later is real work. If your data has residency or sovereignty constraints, or you need the store inside your own VPC, a managed-only SaaS may be off the table entirely. When those are non-issues and ops time is scarce, Pinecone is a clean, fast way to ship.

Which is cheapest and which scales furthest?

Cost and scale pull in different directions, so compare them on the same axis.

Factor	pgvector	Qdrant	Pinecone
Cheapest at small scale	Yes — reuses your existing Postgres	Self-hosted is infra-cost-only	Free tier (2 GB) then $50/mo min (pricing)
Index dimension limit	2,000 (`vector`); 4,000 (`halfvec`) (repo)	High; handles 1,536–3,072-dim models natively	Handles large dims; managed
Practical scale ceiling	A few million vectors before serious tuning	Tens of millions+ with sharding/quantization	Very large, high-throughput indexes
Ops burden	Low if PG exists; you own tuning	Medium self-host; low on Qdrant Cloud	Lowest — fully managed
Lock-in risk	None	None (open source)	High (managed-only)

The honest summary: pgvector is cheapest and simplest if you're already on Postgres and under a few million vectors; Qdrant scales furthest while staying portable and open; Pinecone removes ops entirely at the price of a recurring bill and lock-in.

Does the vector DB actually decide RAG quality?

Mostly no — and this is the part teams get wrong. All three engines retrieve nearest neighbours well. What separates a RAG system that holds up from one that hallucinates is everything around the store: how you chunk and add metadata, whether you run hybrid (keyword + semantic) search, whether you rerank candidates before they reach the model, and whether you verify citations and run evals. Swapping Pinecone for Qdrant will not fix bad chunking. Adding a reranker often will. Pick the store that fits your stack and ops appetite, then put your real effort into retrieval quality.