Writing · Essay · Local
Good systems reduce panic
A quieter test for system quality — the one that survives a year on call.
There are many ways to evaluate a system: latency, throughput, correctness, cost, elegance. They all matter. But a quieter test holds up better than most of them: does the system reduce the panic of the people who depend on it?
A system reduces panic when its failure modes are legible, when its boundaries are clear, when its behaviour under stress is closer to graceful than to surprising, and when the people who operate it can reason about it without reading the source.
A system creates panic when it is sometimes correct, when its boundaries leak, when small changes have large and unrelated consequences, when nobody owns the parts that matter, and when the only people who can answer questions about it are the ones who built it.
The technical choices that produce one or the other are not always obvious. But the test is. If a system has been running for a year and the people around it are calmer, it is a good system. If they are more anxious, it is not — regardless of what the dashboards say.