What is observability in software engineering, and why does it matter?
Observability is the ability to infer the internal state of a system from external outputs (metrics, logs, traces). It helps detect, diagnose, and prevent issues, improving reliability and performance.
What are the three core pillars of observability and what does each provide?
Metrics provide numerical measurements over time (e.g., latency, throughput); Logs capture discrete events with context and timestamps; Traces show the end-to-end flow of a request across services, with timing for each step.
What is a metric, and what are common types?
A metric is a numeric value tracked over time to quantify a property of the system. Common types include counters (accumulate), gauges (current value), and histograms (distribution of values).
What is a log, and what information does it typically include?
A log is a recorded event with context, typically including a timestamp, severity level, message, and metadata such as request IDs or user IDs.
What is a trace, and how does it help with troubleshooting?
A trace records the path of a single request across services, broken into spans with timing and context. It helps identify latency hotspots and service boundaries for end-to-end troubleshooting.