Event Correlation – What If You Can’t Group Alerts?
Richard Whitehead | May 31, 2016

Think you can’t group alerts? Unsupervised machine learning can tell you things you didn’t know.

Think you can’t group alerts? Unsupervised machine learning can tell you things you didn’t know.

A key differentiator of Moogsoft is Situations, both the technology used to generate them, and also the ground-breaking collaborative workflow built around them.

However, some folks struggle to make the break from alert-based management to situational management. This is understandable in many cases, especially where vast amounts of time and technical expertise have been invested in fine-tuning alert-based systems.

A fairly common question I get asked, especially in mature environments, is: “What happens if our alerts cannot be clustered? How can you reduce then our workload?”

To tackle that question, you first have to understand where it’s coming from.

That question usually stems from the assumption that the environment is so well tuned that only “actionable” alerts are being presented. Therefore, there is no opportunity to reduce the workload.

Firstly, it would be remiss not to address the issue of over aggressive filtering. If it’s indeed true that only truly actionable alerts are being allowed through, then we at Moogsoft would contend that you are over filtering, and consequently missing critical precursors to problems, as well as data that would assist in problem resolution, negatively impacting Mean-Time to Detect and Mean-Time to Resolution, respectively.

This new concept of MORE alerts, but in LESS situations, can feel a little counter-intuitive at first.

The next question is that of being able to aggregate alerts into situations. Moogsoft’s uniquely powerful approach to situation creation using multiple real-time algorithmic techniques to cluster the alert flow is radically changing the way people are managing vast, rapidly changing virtualized environments. But there remains the question: In a more static (legacy perhaps) environment, if only actionable alerts are being processed, is there a case for clustering?

4 Reasons for Managing Situations Over ‘Actionable’ Alerts

Firstly, there’s the power of small numbers, or “small data” if you were. While we all get excited when we see hundreds of alerts clustered into a situation, does that mean a small cluster size isn’t valuable? No! If you simply group two related alerts together, that’s a volume reduction of 50%. And when you have customers cost-justifying their investment in Moog with a 23% reduction, 50% is great!

Secondly, just because alerts are actionable, it doesn’t mean they can’t be clustered into a situation. An example I saw was a situation containing 12 e-mail notifications saying ATM machines were down, each requiring acknowledgment, and timely confirmation that remedial action is being taken. But are they really 12 discrete operator actions? Or can all 12 be dealt with simultaneously?

Thirdly, even if in the remote case that in fact, these alerts are actually unrelated, perhaps they can still belong in the same situation. I know, it sounds unlikely, but take the case of multiple servers requiring a re-boot? The servers are unrelated, on different segments, performing different services, but they are all related in terms of workflow. They require a re-boot. So you put them in the same situation, and issue a single command to re-start all servers in that situation.

And finally, are you REALLY sure these alerts can’t be clustered? You see, that ‘s one of the characteristics of Unsupervised Machine Learning, it can tell you things you didn’t know.

Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps Platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.
See Related Posts by Topic:

About the author


Moogsoft Resources

August 4, 2020

Telemetry Everywhere: Observability in the DevOps Cosmos

July 22, 2020

What’s Observability with AIOps? Check Out Our New Book, Webinars and Infographic

July 21, 2020

Why Observability Matters to Site Reliability Engineers

June 29, 2020

Moogsoft Express Helps DevOps and SRE Teams Develop More and Operate Less