IT Monitoring & the War on Change
Sahil Khanna | April 27, 2017

Due to the rapid growth, constant change & increasing complexity of modern IT environments, monitoring will never be complete.

Due to the rapid growth, constant change & increasing complexity of modern IT environments, monitoring will never be complete.

Nancy Gohring, Senior Analyst at 451 Research, recently presented some of her research at the 2017 AIOps Symposium, and I found two of her data points on Facebook (shown below) to be particularly interesting.

These findings are staggering, and consistent with some of Moogsoft’s own research, showing that the number of data points monitored across Applications and IT infrastructures have increased by >150,000% since the late 1990s.

What’s clear is that, despite the inevitable growth in IT monitoring data, the utilization of that data and overall service quality is decreasing. If you talk to any enterprise IT organization and ask whether or not they have a solid grasp over their monitoring, nine out ten will say no. The one out of ten is typically either a company like Netflix, or they likely aren’t being completely honest with you, or themselves.

The true challenge that these organizations are facing isn’t that they struggle to monitor and manage their production environments, rather, it’s the fact that they expect this endeavor to be complete one day. In the world of ITOps, DevOps, SRE, etc. it’s quite natural to operate under the belief that, “Once we get our monitoring figured out, we’ll be all set.”

Well, much like my colleague’s new haircut, which he can’t stop looking at, your current IT monitoring strategy will not last forever.

Why Your Monitoring is Incomplete Today

You might have five great tools today — let’s say Splunk, Nagios, Solarwinds, AppDynamics, and Datadog. You might have a variety of teams working across these toolsets to detect and troubleshoot issues.

Adopting these tools is an excellent and strategic move, but it’s a guarantee that the configuration and deployment of these tools are incomplete. These tools likely create enormous volumes of alerts that require constant configuration — even just to turn alerting on or off, or to avoid alert fatigue — resulting in reduced visibility and productivity burn. There is probably also an initiative to build correlation across these toolsets to improve visibility and productivity.

Furthermore, these tools probably aren’t deployed across your production environment as you had initially planned when making the investments. In fact, Gartner has reported that the average APM deployment penetration is just 5%.

So when will enterprise monitoring finally be complete?

Your Monitoring Won’t be Complete Tomorrow, or the Day After

IT plays a vital role in any organization’s success, and today, agility is the name of the game. If you aren’t investing in the latest and greatest, you’re falling behind because your competitors are making that investment.

For example, let’s say that your organization decides to start using containers to deploy applications faster. You adopt Docker for container runtime, Kubernetes for cluster management and deployment, Dell/EMC libStorage for storage provisioning, etc… And what happens to your monitoring strategy as a result? It changes. This shift will require new tools and new teams to gain visibility, and manage service quality.

Containerization is just one example across the range of initiatives that most enterprise organizations have planned for the next few years. Just think about all of the technologies used in production today that weren’t around 10 years ago — Hadoop, Spark, Mesosphere DC/OS, Grafana, MongoDB, Cassandra, Redis, SDN, SDI, etc.  As change continues to occur (at an increasingly rapid pace, no less), IT organizations will need to invest in new teams to manage new technologies. There will be more metrics and Event data, more disparity between that data, and it will be more challenging to utilize the increasing volumes of data.

Facebook, one of the largest IT operations on the planet, is generating 26 trillion data points a day, and only leverages 1% of it. This example tells us that more monitoring doesn’t mean better information. Furthermore, current IT data volumes are well beyond the human cognitive limitation.

So how can you get better information?

Algorithms are Change Tolerant, Static Models are Not

The beauty of algorithms is the ability adapt to change. Unsupervised machine learning (ML) can understand normal from abnormal within massive data sets without being explicitly told what to look for. Supervised ML, on the other hand, can incorporate user-supplied feedback and loose guidance to optimize performance. Combining these techniques and applying them to enterprise production environments means real-time insight across massive and evolving data sets. Additionally, it eliminates the human bottleneck.

Let’s take DevOps as a use case. How many data points get generated every time code changes? Probably a lot — requiring deep analysis through many tools by different teams. But what DevOps needs to know at the end of the day is two-fold: Is something happening to my app? And if so, is it us or it is someone else?

The truth is that there is enough information in just configuration tools, live test APM tools, and App logs for Algorithmic IT Operations (AIOps) tools, like Moogsoft, to answer those questions in real-time. You can see a visualization of this specific example in the below screenshot of a Moogsoft Situation (cluster of related alerts).

With open APIs, you can easily to push new data sources to AIOps platforms as they are introduced to your environment. Is this an appropriate place to use the buzzword ‘Change-Tolerance,’ or shall I say ‘Future-Proof’?

In summary, the war on change will never come to an end. In fact, it will get worse each year as demand or innovation intensifies. The good news is that better algorithms lead to better information, and companies like Moogsoft have those algorithms readily available today.

Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps Platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.
See Related Posts by Topic:

About the author

Sahil Khanna

Sahil Khanna is a Sr. Product Marketing Manager at Moogsoft, where he focuses on the emergence of Algorithmic IT Operations. In his free time, Sahil enjoys banging on drums and participating in high-stakes bets.

All Posts by Sahil Khanna

Moogsoft Resources

August 4, 2020

Telemetry Everywhere: Observability in the DevOps Cosmos

July 22, 2020

What’s Observability with AIOps? Check Out Our New Book, Webinars and Infographic

July 21, 2020

Why Observability Matters to Site Reliability Engineers

June 29, 2020

Moogsoft Express Helps DevOps and SRE Teams Develop More and Operate Less