Vector databases (or search engines with dense vectors) store embeddings and run approximate nearest neighbor (ANN) queries fast at scale.
Components
- Embedding model (query + document)
- Index (HNSW, IVF, etc.)—trade recall vs speed
- Optional metadata filters (tenant_id, product)
Hybrid search
Combine BM25 keyword + vector scores for SKU lookups, names, and legal citations where exact tokens matter.
Complexity intuition
ANN search is sub-linear with tuning—see DSA for why naive pairwise comparison fails at millions of vectors.
Important interview questions and answers
- Q: What is ANN?
A: Approximate search that sacrifices tiny recall for large speedups.
Self-check
- What three components does a vector pipeline need?
- Why hybrid search?
Tip: Hybrid BM25 + vector helps SKUs, legal cites, and exact error codes.
Interview prep
- ANN?
Approximate nearest neighbor search scales to millions of vectors.
- Hybrid search?
Keyword + vector improves exact token matches (SKUs, statutes).