Sparse retrieval advances refer to improvements in information retrieval techniques using sparse representations. BM25+ enhances the classic BM25 ranking by refining term frequency and document length normalization. SPLADE (Sparse Lexical and Expansion Model) leverages neural networks to generate sparse term representations, improving matching accuracy. Learned Sparse Models use deep learning to create sparse vectors for documents and queries, enabling efficient search. In Retrieval-Augmented Generation (RAG), these methods enhance generative models by providing relevant retrieved context.
Sparse retrieval advances refer to improvements in information retrieval techniques using sparse representations. BM25+ enhances the classic BM25 ranking by refining term frequency and document length normalization. SPLADE (Sparse Lexical and Expansion Model) leverages neural networks to generate sparse term representations, improving matching accuracy. Learned Sparse Models use deep learning to create sparse vectors for documents and queries, enabling efficient search. In Retrieval-Augmented Generation (RAG), these methods enhance generative models by providing relevant retrieved context.
What is sparse retrieval?
A retrieval approach that uses sparse representations (mostly zeros) for queries and documents, enabling fast, scalable search with inverted indexes and term-level signals.
What is BM25+?
A refined version of BM25 that adds a small delta term and adjusted length normalization to improve ranking, especially for long documents, while keeping a simple, fast scoring formula.
What is SPLADE?
A method that learns sparse representations of queries and documents to enable effective lexical matching with neural models, while remaining compatible with traditional inverted-index search.
What are learned sparse models?
Models trained to produce sparse, term-like representations from text, combining neural understanding with fast, indexable search.
How do these advances help at scale?
They improve ranking accuracy while preserving fast retrieval on large collections by using sparse indices that support both neural signals and lexical matching.