Question 1

What is Recall@K and how is it interpreted?

Accepted Answer

Recall@K measures how many of the relevant items for a user appear in the top-K recommendations. For each user, Recall@K = (relevant items in top-K) / (total relevant items). The overall value is the average across users; higher is better.

Question 2

What is MRR (Mean Reciprocal Rank) and when should I use it?

Accepted Answer

MRR focuses on how quickly the first relevant item appears in the predicted ranking. For each user, RR = 1 / rank of the first relevant item (0 if none). MRR = average RR across users. Higher values indicate faster first hits.

Question 3

What is nDCG@K and why normalize?

Accepted Answer

nDCG@K evaluates ranking quality using graded relevance and position. DCG@K = sum from i=1 to K of (2^rel_i - 1) / log2(i+1). IDCG@K is the best possible DCG for that user. nDCG@K = DCG/IDCG, ranging from 0 to 1; higher is better.

Question 4

How does Precision@K differ from Recall@K?

Accepted Answer

Precision@K = (# relevant in top-K) / K, while Recall@K = (# relevant in top-K) / (total relevant). Precision measures relevance density in the top-K; Recall measures coverage of all relevant items.

Question 5

How should I choose K and combine these metrics for evaluation?

Accepted Answer

Choose K to reflect typical user experience (e.g., 5 or 10). Use multiple metrics: MRR and nDCG@K for ranking quality and early hits; Recall@K for coverage. Interpret results in the context of your dataset and goals.

Evaluation Metrics: Recall@K, MRR, nDCG

Evaluation Metrics: Recall@K, MRR, nDCG

💡 Key Takeaways

❓ Frequently Asked Questions