A Closer Look at Root Cause Analysis

Historically, event management tools focus on determining the root cause of an incident. That’s a problem…

Traditional root cause analysis is becoming less and less reliable. At least three things affect real root cause and its functionality in today’s environment: mobility, software defined networks and the “single-user” mindset.

First, consider mobility. When things break, virtual machines can vMotion to other servers, other networks, and other topologies. Even network changes occur transparently.

For example, an MPLS fast reroute occurs in 55 microseconds. And when it does, the characteristics of the network may change. Latency may change. Routing may change. And the applications communicating over the network may change; TCP Window size changes, etc.

Next, Software Defined Networks (SDNs) and Network Functions Virtualization (NFV) add to the mix, supporting the ability to virtualize and dynamically implement networks to support business objectives on demand.

Because of this, event correlation has become somewhat nebulous and ineffective. For example, many correlation techniques are applicable only to static infrastructures. They assume that the Infrastructure is set in stone… and it is not. What happens when portions of your applications are migrated to the cloud?

And there’s a third factor. The real misnomer is that in the legacy management system, the root cause hand-off is still to a single user.

The reality is this: Operations is a multiuser, multi system, multi-technology environment. One root cause may be pertinent to more than one user, more than one system, and more than one application. Nothing is in a vacuum. There is cause and effect in every nook and cranny.

With the adaptive nature of todays infrastructures, you need management technology that is adaptive as well. Incident.Moog uses machine learning and clustering techniques to apply adaptive intelligence to your incident processes and enables the focusing of the situations to the people affected by the alerts within the situation.

Incident.MOOG doesn’t identify “a” root cause – rather it presents the Situation in context, creates a Situation Room (i.e., virtual war room) and invites the appropriate stakeholders to the room. Then, those stakeholders to the Situation can quickly work out whether they are the causal or impacted/collateral party to the Situation and react accordingly. The result is earlier and more efficient problem resolution.