Question 1

What is a bi-encoder retriever?

Accepted Answer

A model that encodes queries and documents separately into fixed-size embeddings, enabling fast approximate nearest-neighbor search using cosine similarity or dot product.

Question 2

What is contrastive learning in this setting?

Accepted Answer

A training objective that pulls together representations of matching query–document pairs (positives) and pushes apart non-matching pairs (negatives), shaping an embedding space suited for retrieval.

Question 3

How are positives and negatives defined for bi-encoder contrastive learning?

Accepted Answer

Positives are relevant query–document pairs. Negatives are non-relevant documents sampled from the pool or batch; hard negative mining can be used to improve learning.

Question 4

Why use contrastive learning for bi-encoder retrievers?

Accepted Answer

It yields semantic, dense embeddings for fast retrieval at inference and scales well with data, avoiding expensive cross-encoders during search.

Question 5

What is a common loss used in contrastive IR training?

Accepted Answer

InfoNCE (NT-Xent) loss, which maxes the similarity of the positive pair relative to negatives within a batch.

Contrastive Learning for Bi-Encoder Retrievers

Contrastive Learning for Bi-Encoder Retrievers

💡 Key Takeaways

❓ Frequently Asked Questions