Question 1

What is a baseline in evals?

Accepted Answer

A simple reference point or model used to judge improvements, such as a naive heuristic or a previous system.

Question 2

What is an oracle in evaluations?

Accepted Answer

A source of truth that defines correct outputs—often the gold standard or a hypothetical perfect decision-maker used to bound performance.

Question 3

Why use baselines and oracles in evaluation?

Accepted Answer

They provide context, help detect overfitting, and show how much improvement is gained beyond simple methods or ideal accuracy.

Question 4

How should you choose and implement baselines?

Accepted Answer

Pick simple, task-relevant methods, include more than one baseline, and report evaluation metrics consistently.

Question 5

How do you validate an oracle or ground-truth data?

Accepted Answer

Use high-quality labels, assess inter-annotator agreement, follow clear guidelines, and adjudicate disagreements.

Building Baselines and Oracles for Evals