What is meta-evaluation?
Meta-evaluation studies how well evaluation metrics reflect human judgments about model outputs, helping judge metric reliability and usefulness.
What does correlation mean in this context?
Correlation measures how closely metric scores align with human judgments; a positive correlation indicates that higher metric scores tend to match higher human scores.
Why compare metrics with human judgments?
To ensure metrics guide improvements that match human preferences and to detect metrics that may mislead model development.
Which statistics are commonly used to measure this correlation?
Pearson, Spearman, and Kendall tau are commonly used to quantify the relationship between metric scores and human judgments.