Safety Taxonomies and Severity Scales in LLM Evaluations refer to structured frameworks used to categorize and assess potential risks, harms, or failures associated with large language models. These tools help systematically identify different types of safety issues—such as privacy breaches, misinformation, or harmful content—and rate their seriousness. By applying these taxonomies and scales, researchers and developers can prioritize mitigation strategies and improve the overall safety and reliability of language models.
Safety Taxonomies and Severity Scales in LLM Evaluations refer to structured frameworks used to categorize and assess potential risks, harms, or failures associated with large language models. These tools help systematically identify different types of safety issues—such as privacy breaches, misinformation, or harmful content—and rate their seriousness. By applying these taxonomies and scales, researchers and developers can prioritize mitigation strategies and improve the overall safety and reliability of language models.
What is a safety taxonomy?
A structured framework that classifies safety-related data (hazards, risks, controls, incidents) into standard categories to support consistent reporting and analysis.
What is a severity scale?
A ranking system that assigns a level to the impact of a hazard or incident, used to prioritize actions and allocate resources. Common forms: 1–5 or terms like minor, moderate, severe, critical.
How are safety taxonomies and severity scales used together?
They standardize data collection, enable comparison across events, support risk assessment, and help identify patterns and prioritized safety actions.
How do you develop or choose a severity scale?
Define clear, objective criteria for each level, align with regulatory requirements and your organization's risk tolerance, and provide examples to ensure consistent use.
What are common examples of safety taxonomy categories?
Hazard types (chemical, electrical, mechanical), exposure sources (tools, processes), outcome categories (injury, property damage, environmental harm), and control types (engineering, administrative, PPE).