A Closer Look at Root Cause Analysis
Moogsoft Team | July 28, 2014

Operations is a multi-user, multi-system, multi-technology environment. Naturally, single root causes are a thing of the past.

Operations is a multi-user, multi-system, multi-technology environment. Naturally, single root causes are a thing of the past.

Historically, event management tools focus on determining the root cause of an incident. That’s a problem…

Traditional root cause analysis is becoming less and less reliable. At least three things affect real root cause and its functionality in today’s environment: mobility, software defined networks and the “single-user” mindset.

First, consider mobility. When things break, virtual machines can vMotion to other servers, other networks, and other topologies. Even network changes occur transparently.

For example, an MPLS fast reroute occurs in 55 microseconds. And when it does, the characteristics of the network may change. Latency may change. Routing may change. And the applications communicating over the network may change; TCP Window size changes, etc.

Next, Software Defined Networks (SDNs) and Network Functions Virtualization (NFV) add to the mix, supporting the ability to virtualize and dynamically implement networks to support business objectives on demand.

Because of this, event correlation has become somewhat nebulous and ineffective. For example, many correlation techniques are applicable only to static infrastructures. They assume that the Infrastructure is set in stone… and it is not. What happens when portions of your applications are migrated to the cloud?

And there’s a third factor. The real misnomer is that in the legacy management system, the root cause hand-off is still to a single user.

The reality is this: Operations is a multiuser, multi system, multi-technology environment. One root cause may be pertinent to more than one user, more than one system, and more than one application. Nothing is in a vacuum. There is cause and effect in every nook and cranny.

With the adaptive nature of todays infrastructures, you need management technology that is adaptive as well. Incident.Moog uses machine learning and clustering techniques to apply adaptive intelligence to your incident processes and enables the focusing of the situations to the people affected by the alerts within the situation.

Incident.MOOG doesn’t identify “a” root cause – rather it presents the Situation in context, creates a Situation Room (i.e., virtual war room) and invites the appropriate stakeholders to the room. Then, those stakeholders to the Situation can quickly work out whether they are the causal or impacted/collateral party to the Situation and react accordingly. The result is earlier and more efficient problem resolution.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers and operators to instantly see everything, know what’s wrong and fix things faster.
See Related Posts by Topic:

About the author


Moogsoft Team

The people on the Moogsoft Team, aka. The Herd, are a passionate group of technologists united by a focus on the future: accelerating the evolution of the cloud and addressing the business world’s greatest IT challenges.

All Posts by Moogsoft Team

Moogsoft Resources

April 29, 2021

Q&A from the Moogsoft/Datadog Fireside Chat

April 23, 2021

New Gartner AIOps Platform Market Guide Shows More Use Cases for Ops and Dev Teams

April 21, 2021

James (IT Ops Guy) and Dinesh (SRE), Petition the CIO and CFO For AIOps Rollout

April 21, 2021

Coffee Break Webinar Series: Under the Covers of AIOps