How to Help Teams Create Optimal Infrastructure for Availability
Richard Whitehead | November 30, 2022

Teams are locked into a cycle of suffering characterized by the feeling that they are sprinting just to stay still. This morale and productivity-destroying state is caused by an inability to find time to save time. Our new research, The State of Availability Report 2022, discovered that teams know what they want to do—harness cloud and DevOps practices and tools to advance digital transformation—but something’s getting in the way.

The data we collected showed that teams are:

  • Drowning in data thanks to monitoring tools proliferation
  • Stuck in monitoring and incident management cycles
  • Not even delivering on the availability promises they are making

It’s time for leaders to help their teams to unlock time to create optimal infrastructure for availability and escape the vicious cycles they are stuck in—where time is always spent fixing problems and rarely tackling the underlying causes to deliver improvements that have longevity. And for those teams who have autonomy over what work they do when—it’s time for them to adopt new practices and tooling that create an infrastructure that supports sustainable ways of working.

Here’s how to do it.

  1. Start with baselining your current availability state. You need to know what you’re dealing with to know what to change. And you need to be sure your destination aligns with your organization’s goals. In the context of availability, this means creating customer experiences that result in tangible feedback in the form of social sentiment, referrals and reviews, and product or service usage that ultimately result in increased income. You need to know what monitoring tools you have, how they are used, and what they are costing you. And you need to understand your current performance vis a vis the metrics you already have in place.
  2. Then define a small set of KPIs to take forward—and make sure they are aligned to your business goals. As we noted in our previous blog in this series, fewer KPIs correlate with higher performance in terms of meeting SLAs so choose carefully. We recommend error budgets to ensure day-to-day adherence to promises and using MTTD and MTTR to aim to release time from unplanned work to make higher-level improvements. Tagging the type of work your team is doing—unplanned work, paying down technical debt, automating toil, platform improvements, new features—is also going to help you here.
  3. Review your monitoring tools landscape and consolidate by prioritizing tools by value and usage. This will enable you to reduce your Total Cost of Ownership (TCO) and reduce noise.
  4. Now it’s time to reduce the noise you’re getting from the monitoring tools that remain—use AIOps to do this and watch your MTTD drop along with the volume of unplanned work your team’s dealing with.
  5. You can use that time that’s just been released to stabilize your system by paying down technical debt—thus also reducing unplanned work. Automating toil away releases even more time.
  6. Now you have time to adopt the ways of working that leap you forwards—DevOps and cloud. And—instead of just maintaining customer experience, you can invest in innovating.

Getting control of your monitoring landscape, trimming it, and giving it AIOps superpowers is a virtuous circle that leads teams to a place where they can invest in their future—not just survive in the now. These technology teams are a direct line to the customer in a digital economy and their ability to guarantee customer experience and availability determines the success of an organization. Do not treat them like second-class citizens—enable them to be game-changers for your business.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers and operators to instantly see everything, know what’s wrong and fix things faster.
See Related Posts by Topic:

About the author


Richard Whitehead

As Moogsoft's Chief Evangelist, Richard brings a keen sense of what is required to build transformational solutions. A former CTO and Technology VP, Richard brought new technologies to market, and was responsible for strategy, partnerships and product research. Richard served on Splunk’s Technology Advisory Board through their Series A, providing product and market guidance. He served on the Advisory Boards of RedSeal and Meriton Networks, was a charter member of the TMF NGOSS architecture committee, chaired a DMTF Working Group, and recently co-chaired the ONUG Monitoring & Observability Working Group. Richard holds three patents, and is considered dangerous with JavaScript.

All Posts by Richard Whitehead

Moogsoft Resources

January 4, 2023

The State of AIOps: A New Years' Message from Chief Moo Phil Tee

December 20, 2022

Why AIOps is the Connector Between Monitoring, Observability and Incident Management

November 30, 2022

How to Help Teams Create Optimal Infrastructure for Availability

November 29, 2022

Just Maintaining Availability? Try Building Stability