A Closer Look at Root Cause Analysis
moogsoft | July 28, 2014
Operations is a multi-user, multi-system, multi-technology environment. Naturally, single root causes are a thing of the past.

Historically, event management tools focus on determining the root cause of an incident. That’s a problem…

Traditional root cause analysis is becoming less and less reliable. At least three things affect real root cause and its functionality in today’s environment: mobility, software defined networks and the “single-user” mindset.

First, consider mobility. When things break, virtual machines can vMotion to other servers, other networks, and other topologies. Even network changes occur transparently.

For example, an MPLS fast reroute occurs in 55 microseconds. And when it does, the characteristics of the network may change. Latency may change. Routing may change. And the applications communicating over the network may change; TCP Window size changes, etc.

Next, Software Defined Networks (SDNs) and Network Functions Virtualization (NFV) add to the mix, supporting the ability to virtualize and dynamically implement networks to support business objectives on demand.

Because of this, event correlation has become somewhat nebulous and ineffective. For example, many correlation techniques are applicable only to static infrastructures. They assume that the Infrastructure is set in stone… and it is not. What happens when portions of your applications are migrated to the cloud?

And there’s a third factor. The real misnomer is that in the legacy management system, the root cause hand-off is still to a single user.

The reality is this: Operations is a multiuser, multi system, multi-technology environment. One root cause may be pertinent to more than one user, more than one system, and more than one application. Nothing is in a vacuum. There is cause and effect in every nook and cranny.

With the adaptive nature of todays infrastructures, you need management technology that is adaptive as well. Incident.Moog uses machine learning and clustering techniques to apply adaptive intelligence to your incident processes and enables the focusing of the situations to the people affected by the alerts within the situation.

Incident.MOOG doesn’t identify “a” root cause – rather it presents the Situation in context, creates a Situation Room (i.e., virtual war room) and invites the appropriate stakeholders to the room. Then, those stakeholders to the Situation can quickly work out whether they are the causal or impacted/collateral party to the Situation and react accordingly. The result is earlier and more efficient problem resolution.

Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.

About the author moogsoft

All Posts by moogsoft

See Related Posts by Topic:

Moogsoft Resources

December 17, 2019

Phil Tee: Observability Requires the Marriage of AI, Metrics and Logs

December 13, 2019

Trouble Ticketing is Dead. Long Live Collaborative Ticketing!

December 5, 2019

AIOps and Smart Alerting

November 25, 2019

AIOps Is Most Successful Analytics Method for     Supporting IT