The State of Availability Report
Insights from 1,900 engineering teams and their best practices to build, scale, and maintain high availability.
On average, 66% of the incident timeline is not actively being tracked.
Discovering there’s an issue takes twice as long as resolving the issue. Furthermore, 80% of respondents aren’t tracking their MTTR. The data shows that the average incident lifecycle is ninety minutes and most respondents are missing their SLAs. That’s a lot of unplanned work that’s not visible. Peter Drucker reputedly said, “If you can’t measure it, you can’t improve it.”
Leaders are unaware of how much of their teams’ time is spent on monitoring.
Teams spend by far the most time monitoring over anything else. Yet management believes their organizations are spending time fairly equally across the board. This should be a wake-up call to leaders everywhere—if you want to invest in the work that enables digital transformation, and inject capacity into teams, you need to help them now to find time and find ways to create more time in the future.
On average, engineering teams manage 16 monitoring tools but still miss SLA targets.
Teams are managing huge amounts of monitoring tools, and leaders report even more tools at an organizational level. The outcome? High SLAs = high # of monitoring tools + high amount of time spent monitoring = no time left for your team besides monitoring your monitoring tools.