The Time is Now to Learn from Availability to Optimize Customer Experience
Helen Beal | September 29, 2022

We’ve just launched our inaugural State of Availability Report and the results are sobering. We discovered that:

  • Organizations are missing SLAs more than they’re hitting them
  • Teams are spending inordinate amounts of their time monitoring
  • But customers are still reporting problems at least half the time
  • Adoption of DevOps tools and practices is still lagging

We’d hoped that at this point in the global digital transformation, organizations had gotten further ahead with mastering availability but there’s still a long way to go.

All is not lost—but now is the time to learn and act. Availability matters to customer experience because customers just expect software to perform, natch. They don’t care about up-time and down-time and error budgets—they just expect the service to be there for them, and responsive when they need it. Our job as software providers—and everyone’s a software company in the digital economy—is to enhance people’s daily lives—at work and for pleasure. When we don’t, they stop using our offerings and tell others they are bad—reviews, referrals, and Net Promoter Score (NPS) all suffer. Organizational performance denigrates as a result putting everyone’s jobs at risk.

Customer experience is intrinsically tied to employee experience so bad customer experience means poorly motivated and underperforming employees. It’s a vicious circle. So what can be done to fix this?

The bottom line is that what we’re doing today isn’t working. All this investment in monitoring tools and all this time spent working on managing and monitoring the monitoring tools isn’t paying out in acceptable levels of availability. The solution is to stop and look at what’s happening. So many times as a DevOps coach I was told by teams that:

  • “We can’t find the time to save time.”
  • “We’re sprinting just to stay still.”
  • “We’re changing the wings/wheels while we’re flying/driving.”

DevOps and clouds are the key enablers for digital transformation so if we’re not getting the time to adopt these new ways of working, we’re stymying our ability to perform in the digital economy. If teams aren’t adopting these practices, it’s because they don’t have time—they don’t have time because they’re constantly monitoring or managing incidents.

Incidents produce unplanned work—we don’t know when it’s going to happen, or how long it’s going to take us to discover and repair/resolve the problem. Past experience may give us pointers, but in software, as in life, it’s best to expect the unexpected. Planned work is everything in your backlog or projects (if you still have these)—so new features, changes, cloud/infrastructure upgrades, and patches. If you’re in an enlightened, learning organization, it might also include mastery—training and certification, and personal development. If you’re lucky, you might also be able to invest in paying down technical debt or using higher levels of experimentation like hackathons or chaos engineering—by “higher levels” I mean that your new features and enhancements should already be approached as experiments.

But are you just mainly monitoring and responding to customer complaints? Our research certainly showed this to be the case.

What we’ve learned then, is that to improve availability, throwing monitoring tools and people at it, doesn’t work. The solution to breaking out of the trap that we’ve created for ourselves lies in killing unplanned work and gaining back that time to invest in the future. The alternative is to stop working on new things—and who’s customer, stakeholder or business is going to be remotely on board with that?

Unplanned work in the context of availability means finding and fixing the problems (MTTD and MTTR). But even with all these monitoring tools, that’s not happening fast enough. Using Moogsoft reduces the noise so we can see the problem quicker—quicker than our customers can report it and we save so much time getting to the right solution to fix it.

All that time we’ve gained back can be invested in making the system more stable—using DevOps practices such as CICD.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers and operators to instantly see everything, know what’s wrong and fix things faster.
See Related Posts by Topic:

About the author


Helen Beal

Helen Beal is a DevOps and Ways of Working coach, Chief Ambassador at DevOps Institute and an Ambassador for the Continuous Delivery Foundation. She provides strategic advisory services to DevOps industry leaders and is an analyst at Accelerated Strategies Group. She hosts the Day-to-Day DevOps webinar series for BrightTalk, speaks regularly on DevOps topics, is a DevOps editor for InfoQ and also writes for a number of other online platforms. Outside of DevOps she is an ecologist and novelist.

All Posts by Helen Beal

Moogsoft Resources

November 30, 2022

How to Help Teams Create Optimal Infrastructure for Availability

November 29, 2022

Just Maintaining Availability? Try Building Stability

November 21, 2022

A Fireside Chat with Phil Tee, CEO of Moogsoft

November 16, 2022

Demystifying Availability KPIs — and What Most Companies Miss