Hybrid search with BM25 + dense vectors combines traditional keyword-based retrieval (BM25) with semantic search using dense vector embeddings. This advanced Retrieval-Augmented Generation (RAG) technique leverages BM25 to efficiently find relevant documents based on exact term matches, while dense vectors capture deeper contextual and semantic relationships. By integrating both methods, hybrid search enhances retrieval accuracy and relevance, improving the quality of information fed into generative AI models for more robust responses.
Hybrid search with BM25 + dense vectors combines traditional keyword-based retrieval (BM25) with semantic search using dense vector embeddings. This advanced Retrieval-Augmented Generation (RAG) technique leverages BM25 to efficiently find relevant documents based on exact term matches, while dense vectors capture deeper contextual and semantic relationships. By integrating both methods, hybrid search enhances retrieval accuracy and relevance, improving the quality of information fed into generative AI models for more robust responses.
What is BM25?
BM25 is a ranking function used in search engines that scores documents based on query term frequency, document length, and inverse document frequency, prioritizing keyword relevance.
What are dense vectors?
Dense vectors are compact numerical embeddings produced by neural models that capture semantic meaning, enabling similarity comparisons beyond exact keyword matches.
What is hybrid search?
Hybrid search combines keyword-based retrieval (BM25) with semantic vector similarity to retrieve and rank results that are both keyword-relevant and semantically related.
How are results blended in hybrid search?
BM25 and vector similarity scores are typically combined with a weighting factor, or a two-stage process is used (fast BM25 filtering followed by embedding-based re-ranking) to balance relevance and latency.