There is no universally best vector database — there is the one that fits your stack and scale. pgvector, Qdrant, and Pinecone all retrieve nearest neighbours well; they differ in where they run, how far they scale, how native their hybrid search and filtering are, and what they cost to operate. I have shipped RAG on all three, and the choice usually comes down to: are you already on Postgres, do you want to self-host, and how much ops time do you have?

How do Pinecone, Qdrant, and pgvector compare at a glance?

Here is how the three line up on the factors that actually decide a production RAG build.

pgvector Qdrant Pinecone
Hosting model Postgres extension (self-host or any managed PG: Supabase, RDS, Neon) Open-source; self-host (Docker/K8s) or Qdrant Cloud Fully managed SaaS only
GitHub stars (2026) ~22k (repo) ~32.8k (repo) Closed-source (managed)
Scale Comfortable to a few million vectors; tuning needed beyond Tens of millions+ with sharding, quantization, on-disk Designed for very large, high-throughput indexes
Native hybrid search Dense + Postgres full-text; you wire fusion yourself Native dense + sparse with built-in RRF/DBSF fusion Native sparse-dense hybrid, managed
Indexing HNSW + IVFFlat HNSW (+ quantization, on-disk) Proprietary, managed
Cost model Free if you already run PG; pay for the DB Free self-hosted; usage-based on Qdrant Cloud Usage-based managed; recurring line item
Data residency / lock-in Full control; lives in your DB Full control; portable across self-host and cloud Managed-only; migration is real work

If you already run Postgres, start with pgvector — one fewer system to operate beats marginal recall gains until you actually hit a scale or hybrid-search wall. Reach for Qdrant or Pinecone when you do.

When should you use pgvector?

pgvector is the right default when Postgres is already in your stack. It stores embeddings as a column type and adds vector indexes (HNSW or IVFFlat) plus distance operators, so similarity search lives next to the rows it relates to. You filter on real columns, join against your application tables, and keep one backup, one connection pool, one thing to monitor. The project sits around 22k GitHub stars and ships steadily (pgvector repo).

The trade-offs are real but bounded. The big one is the index dimension cap: pgvector indexes standard vector columns up to 2,000 dimensions (pgvector repo). That matters because OpenAI's text-embedding-3-large outputs 3,072 dimensions by default (OpenAI docs) — over the cap. The escape hatches are real: truncate the embedding via the dimensions parameter (Matryoshka), or use halfvec (half-precision), which indexes up to 4,000 dimensions (pgvector repo). text-embedding-3-small at 1,536 dimensions (OpenAI docs) sits comfortably under the cap with no tricks.

Filtering used to be pgvector's weak spot; version 0.8.0 (October 2024) added iterative index scans and better filter cost estimation specifically to fix overfiltering on WHERE-clause queries (PostgreSQL.org). HNSW gives a better speed-recall tradeoff than IVFFlat at the cost of slower builds and more memory (pgvector repo). Use pgvector when:

  • You already run Postgres (Supabase, RDS, Neon, self-hosted) and want vectors beside your data.
  • Your corpus is in the thousands-to-low-millions of vectors.
  • You value operational simplicity and metadata-rich filtering over peak vector throughput.

When should you use Qdrant?

Qdrant is my pick when I want a dedicated, open-source engine that does hybrid search and filtering well out of the box. It is the most-starred of the three at roughly 32.8k GitHub stars (Qdrant repo). It supports dense, sparse, and multi-vector (e.g. ColBERT-style) representations and merges them with configurable fusion — Reciprocal Rank Fusion or Distribution-Based Score Fusion (Qdrant hybrid queries docs). For memory-constrained scale, built-in quantization cuts RAM usage by up to 97% (Qdrant repo). Because it is open source, you run it in Docker locally, self-host on your cluster, or use Qdrant Cloud — and move between them without rewriting your retrieval layer.

On the Indian legal AI platform I built, Qdrant is the vector store behind a Hybrid RAG layer: voyage-law-2 embeddings (a 1,024-dim, legally-tuned model trained on a trillion legal tokens that beats text-embedding-3-large by ~6% on average across legal retrieval benchmarks — Voyage AI) plus BM25 sparse vectors and an LLM reranker. Native sparse-dense fusion made it straightforward to combine semantic recall with exact-term matching — which matters when a wrong or missing authority is a liability, not a cosmetic bug. Use Qdrant when:

  • You want native hybrid (dense + sparse) search and rich metadata filtering without bolting it together.
  • You want portability — self-host or managed, no SaaS lock-in, full data residency control.
  • You expect to grow into tens of millions of vectors and want quantization and on-disk options ready.

When should you use Pinecone?

Pinecone is the right call when you want a fully managed service and the least possible ops. There is no cluster to run, scaling and replication are handled, and it offers native sparse-dense hybrid search and metadata filtering. Its serverless model bills on usage rather than provisioned nodes: roughly $16–$18 per million read units, $4–$4.50 per million write units, and $0.33/GB/month storage, with a Starter free tier of 2 GB storage, 2M write units, and 1M read units, and a Standard plan minimum of $50/month (Pinecone pricing). For teams without infrastructure people — or who simply don't want to spend time on vector-DB ops — that convenience is the whole value proposition.

The costs are vendor lock-in and a recurring bill that scales with query volume. You cannot self-host, and migrating off later is real work. If your data has residency or sovereignty constraints, or you need the store inside your own VPC, a managed-only SaaS may be off the table entirely. When those are non-issues and ops time is scarce, Pinecone is a clean, fast way to ship.

Which is cheapest and which scales furthest?

Cost and scale pull in different directions, so compare them on the same axis.

Factor pgvector Qdrant Pinecone
Cheapest at small scale Yes — reuses your existing Postgres Self-hosted is infra-cost-only Free tier (2 GB) then $50/mo min (pricing)
Index dimension limit 2,000 (vector); 4,000 (halfvec) (repo) High; handles 1,536–3,072-dim models natively Handles large dims; managed
Practical scale ceiling A few million vectors before serious tuning Tens of millions+ with sharding/quantization Very large, high-throughput indexes
Ops burden Low if PG exists; you own tuning Medium self-host; low on Qdrant Cloud Lowest — fully managed
Lock-in risk None None (open source) High (managed-only)

The honest summary: pgvector is cheapest and simplest if you're already on Postgres and under a few million vectors; Qdrant scales furthest while staying portable and open; Pinecone removes ops entirely at the price of a recurring bill and lock-in.

Does the vector DB actually decide RAG quality?

Mostly no — and this is the part teams get wrong. All three engines retrieve nearest neighbours well. What separates a RAG system that holds up from one that hallucinates is everything around the store: how you chunk and add metadata, whether you run hybrid (keyword + semantic) search, whether you rerank candidates before they reach the model, and whether you verify citations and run evals. Swapping Pinecone for Qdrant will not fix bad chunking. Adding a reranker often will. Pick the store that fits your stack and ops appetite, then put your real effort into retrieval quality.