Event Correlation – What If You Can’t Group Alerts?
Richard Whitehead | May 31, 2016

Think you can’t group alerts? Unsupervised machine learning can tell you things you didn’t know.

Think you can’t group alerts? Unsupervised machine learning can tell you things you didn’t know.

A key differentiator of Moogsoft is Situations, both the technology used to generate them, and also the ground-breaking collaborative workflow built around them.

However, some folks struggle to make the break from alert-based management to situational management. This is understandable in many cases, especially where vast amounts of time and technical expertise have been invested in fine-tuning alert-based systems.

A fairly common question I get asked, especially in mature environments, is: “What happens if our alerts cannot be clustered? How can you reduce then our workload?”

To tackle that question, you first have to understand where it’s coming from.

That question usually stems from the assumption that the environment is so well tuned that only “actionable” alerts are being presented. Therefore, there is no opportunity to reduce the workload.

Firstly, it would be remiss not to address the issue of over aggressive filtering. If it’s indeed true that only truly actionable alerts are being allowed through, then we at Moogsoft would contend that you are over filtering, and consequently missing critical precursors to problems, as well as data that would assist in problem resolution, negatively impacting Mean-Time to Detect and Mean-Time to Resolution, respectively.

This new concept of MORE alerts, but in LESS situations, can feel a little counter-intuitive at first.

The next question is that of being able to aggregate alerts into situations. Moogsoft’s uniquely powerful approach to situation creation using multiple real-time algorithmic techniques to cluster the alert flow is radically changing the way people are managing vast, rapidly changing virtualized environments. But there remains the question: In a more static (legacy perhaps) environment, if only actionable alerts are being processed, is there a case for clustering?

4 Reasons for Managing Situations Over ‘Actionable’ Alerts

Firstly, there’s the power of small numbers, or “small data” if you were. While we all get excited when we see hundreds of alerts clustered into a situation, does that mean a small cluster size isn’t valuable? No! If you simply group two related alerts together, that’s a volume reduction of 50%. And when you have customers cost-justifying their investment in Moog with a 23% reduction, 50% is great!

Secondly, just because alerts are actionable, it doesn’t mean they can’t be clustered into a situation. An example I saw was a situation containing 12 e-mail notifications saying ATM machines were down, each requiring acknowledgment, and timely confirmation that remedial action is being taken. But are they really 12 discrete operator actions? Or can all 12 be dealt with simultaneously?

Thirdly, even if in the remote case that in fact, these alerts are actually unrelated, perhaps they can still belong in the same situation. I know, it sounds unlikely, but take the case of multiple servers requiring a re-boot? The servers are unrelated, on different segments, performing different services, but they are all related in terms of workflow. They require a re-boot. So you put them in the same situation, and issue a single command to re-start all servers in that situation.

And finally, are you REALLY sure these alerts can’t be clustered? You see, that ‘s one of the characteristics of Unsupervised Machine Learning, it can tell you things you didn’t know.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers and operators to instantly see everything, know what’s wrong and fix things faster.
See Related Posts by Topic:

About the author


Richard Whitehead

As Moogsoft's Chief Evangelist, Richard brings a keen sense of what is required to build transformational solutions. A former CTO and Technology VP, Richard brought new technologies to market, and was responsible for strategy, partnerships and product research. Richard served on Splunk’s Technology Advisory Board through their Series A, providing product and market guidance. He served on the Advisory Boards of RedSeal and Meriton Networks, was a charter member of the TMF NGOSS architecture committee, chaired a DMTF Working Group, and recently co-chaired the ONUG Monitoring & Observability Working Group. Richard holds three patents, and is considered dangerous with JavaScript.

All Posts by Richard Whitehead

Moogsoft Resources

April 9, 2021

Monthly Moo Update | March 2021

April 8, 2021

A Day in the Life: Sarah the DevOps Engineer and the Beauty of AIOps

March 30, 2021

Coffee Break Webinar Series: Intelligent Observability for SRE

March 24, 2021

Coffee Break Webinar Series: Intelligent Observability for IT Ops