Multi-turn dialogue and conversational state tracking metrics are evaluation methods used to assess how well large language models (LLMs) handle ongoing conversations. These metrics measure the model’s ability to maintain context, remember previous exchanges, and provide coherent, relevant responses across multiple dialogue turns. By tracking conversational state, evaluators can determine if the model accurately follows the flow of conversation, addresses user queries, and avoids contradictions, ensuring a more natural and effective interaction.
Multi-turn dialogue and conversational state tracking metrics are evaluation methods used to assess how well large language models (LLMs) handle ongoing conversations. These metrics measure the model’s ability to maintain context, remember previous exchanges, and provide coherent, relevant responses across multiple dialogue turns. By tracking conversational state, evaluators can determine if the model accurately follows the flow of conversation, addresses user queries, and avoids contradictions, ensuring a more natural and effective interaction.
What is multi-turn dialogue?
A conversation where the user and system exchange several turns, and later responses rely on context from earlier turns.
What is conversational state tracking (CST)?
CST is maintaining a running representation of the user’s goals, preferences, and dialogue context across turns to guide the system’s decisions.
What is joint goal accuracy (JGA) in dialogue state tracking?
JGA measures the fraction of turns where all required slots for the user’s current goal are correctly predicted, reflecting overall state-tracking correctness.
What other metrics are commonly used to evaluate CST?
Slot accuracy, value match rate, turn-level or dialogue success rate, and sometimes generation quality metrics like BLEU, depending on the task.