Alerts! My Key Take-Aways from Monitorama 2015 PDX
Richard Whitehead | July 8, 2015

Thoughts, suggestions and more about the recent Monitorama 2015 event that took place in Portland.

Thoughts, suggestions and more about the recent Monitorama 2015 event that took place in Portland.

I finally got a chance to sit down and collect my thoughts about the recent Monitorama 2015 event that took place in Portland. It was a great couple of days, with a host of fascinating and often entertaining talks by folks deeply imbued in the day-to-day struggles of monitoring rapidly evolving infrastructures.

I was given the opportunity at Monitorama 2015 to present a three and a half minute lightning talk on the concept of Real-Time, Collaborative Situational Management, explaining how it improves service availability in a DevOps environment, saving time and money. I tried to deliver this topic with a light and humorous tone, which seemed to go over well with attendance given the responses of laughs and cheers.

My Key Take-Aways

Over the course of Monitorama, attendees saw a number of talks outlining how open source monitoring solutions have been deployed, and in a number of cases, how they’ve been developed. An interesting observation I noticed was that there was a big focus on the mechanics of monitoring, such as what to monitor, clever ways to reduce footprint, ways to handle massive scale, ease of deployment, what transport mechanism to use, etc. Inés Sombra in particular captured everyone’s attention with her insight and experiences from Fastly.

What was less discussed, however, was what to do with all the data once it had been captured. Dashboards were fairly well represented, being the de-facto final resting place for most instrumented data at small scale. Yet the other popular discussion topic – and a more pertinent area of interest (at least for me) – was alerts.

Here at Moogsoft, we are voracious consumers of alerts, and as such, have fairly strong opinions on what an alert is, isn’t, and what it could be. As I sat and listened to the monitoring exploits and recommended best practices, I wondered how alerts have evolved over the years, and what exactly constitutes a “high quality” alert today?

In some cases, we’re still struggling with the basics, striving for a state-full alert that can tell you when a problem condition has been resolved without an expensive validation test, or worse, inconsistent heartbeats that force the implementation of timers. Fortunately, with increased adoption of modern interfaces, and expressive message formats such as JSON, the technical reasons for poor quality are dying out.

Furthermore, for a variety of reasons (many understandable and good!), alerts coming out of tools, software, and infrastructure today are less structured and less consistent then they were 10-20 years ago, forcing the need to use machine learning to make sense out of it all. Without an algorithmic, data-driven approach, it’s nearly impossible to separate the signal from the noise, making it difficult to get situational awareness earlier to see when an anomaly is unfolding in real-time.

I’ve also noticed that one of the bigger obsessions recently has been “scale,” i.e. how many billions of events per second can be evaluated and stored. But again, without an automated, data-driven approach to signal to noise reduction, this will only result in (other then addressing the issue of reliability) even more alerts to process.

Anomaly detection was yet another area of focus – now, this is getting interesting. When it comes to the “thresholding” of time-series metrics, the state-of-the-art hasn’t really changed in decades, and we’re still struggling with the limitations of static threshold values. Anomaly detection is still in its infancy, but the acknowledgement that we can use algorithms to pre-process data, and improve alert quality is great news for the industry.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers and operators to instantly see everything, know what’s wrong and fix things faster.
See Related Posts by Topic:

About the author


Richard Whitehead

As Moogsoft's Chief Evangelist, Richard brings a keen sense of what is required to build transformational solutions. A former CTO and Technology VP, Richard brought new technologies to market, and was responsible for strategy, partnerships and product research. Richard served on Splunk’s Technology Advisory Board through their Series A, providing product and market guidance. He served on the Advisory Boards of RedSeal and Meriton Networks, was a charter member of the TMF NGOSS architecture committee, chaired a DMTF Working Group, and recently co-chaired the ONUG Monitoring & Observability Working Group. Richard holds three patents, and is considered dangerous with JavaScript.

All Posts by Richard Whitehead

Moogsoft Resources

April 8, 2021

A Day in the Life: Sarah the DevOps Engineer and the Beauty of AIOps

March 30, 2021

Coffee Break Webinar Series: Intelligent Observability for SRE

March 24, 2021

Coffee Break Webinar Series: Intelligent Observability for IT Ops

March 23, 2021

A Day in the Life: Intelligent Observability at Work with a Super SRE