The Operational Dilemma
Tuesday June 5 2018
Despite leaps and bounds in technology, most IT organizations face the same issue that has persisted for decades — too much operational noise.
In the 1990s, I was working in operations at a major North American broadcast facility. Our monitoring ecosystem was very immature and approximately 90% of all service impacts were first identified by our end users.
The plan was to buy tools, deploy monitoring and become more proactive. But as more monitoring tools were deployed, we started to drown in too many events from these tools. It became too noisy to really discern where the problems existed. By deploying an Event Manager, the noise was reduced and events were more focused, but this was based on writing rules, defining filters, and adjusting the models to represent an accurate topology.
In 2018, IT organizations are still trying to solve problems with tools and support process methodologies from the 1990s. Long gone are the days of static environments where single faults had a direct relationship with application impacts. These tools and processes no longer offer economic value.
At the time, it was a reasonable approach based on a very static environment of physical servers and network hardware.
Fast forward to today, and we find an exponential explosion of more efficient and scalable infrastructures. These environments are more dynamic due to continuous application innovation and deployment. Emerging technologies often require new monitoring tools, resulting in sprawling monitoring ecosystems that generate even more noise.
In many organizations today, end-users are still the predominant identifiers of service impacts. IT Operations staff are suffering from alert fatigue, difficulties sorting through the noise to identify real issues, reactive postures, engaging in costly all-hands bridge calls, and enduring personal and professional stress.
In 2018, IT organizations are still trying to solve these problems with tools and support process methodologies from the 1990s. Long gone are the days of static environments where single faults had a direct relationship with application impacts. These tools and processes no longer offer economic value.
The Negative Impacts of the Status Quo
The demands of the modern business require a fast-paced move towards digitalization. We see this daily in how we travel, order food, watch movies, pay our bills, and many other aspects of navigating daily life.
However, in the words of Moogsoft Cofounder and EVP Mike Silvey, “The move to digitization is creating a high tax for those in operations.” As digitization and application modularization moves forward, many challenges arise. These challenges consist of:
- New monitoring tool deployments
- More types of monitoring generating higher event rates
- Increases in the frequency of change of the events
- More noise and frustration across the DevOps resources supporting a given microservice
This is the operational tax that Mike Silvey is speaking about.
So what are the choices available for dealing with this dilemma?
- Turn off alerting within the monitoring ecosystem
- Ignore a majority of the alerts
- Costly monitoring governance projects to tune individual sensors
- Adding more staff to operations
Of course, all these efforts come up short in bringing operational tax relief, and in fact most of them contribute to the problem.
The Desired State
Imagine an IT Operations center for an agile business where the IT and applications event volume increased by >1500%, yet the workload (Tickets) is reduced to only actionable issues. Impacted parties are Situation-aware, MTTR is reduced by 40%, and business impact of application outages is averted.
This is what Moogsoft is helping our customers to achieve through our patented AIOps algorithms and collaborate workflow.
The Positive Benefits of Modernizing IT Operations
The outcomes are the ability to improve operational efficiencies, increase customer satisfaction and help in the digitization of the business. Here are some of the quantitative benefits:
- GoDaddy achieved a 66% reduction in customer reported incidents
- One of Canada’s largest financial institutions was able to reach a 43% reduction in MTTR
- HCL Technologies reduced ServiceNow tickets by 62%
Moogsoft AIOps is helping our customers achieve their goals, and realize that operational dilemmas can be a thing of the past.
Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.