I mostly enjoy attending industry events and conferences because I get to meet IT leaders from fortune 1000 companies and ask my favorite question: “What tools are you guys using?”
It’s a fascinating question because the number of tools that large IT organizations use today is breathtaking, and each person I speak with has instrumented their environment differently.
Even though most are using Splunk and perhaps an AppDynamics or New Relic, the way in which they are using those tools to support their business services is unique; probably because each of their business services and the types of issues that occur are entirely unique.
The below image is an example of the tools that I am used to hearing after asking this question hundreds of times:
My second favorite question is: “How are you tying together all of the information from your ecosystem of monitoring tools?”
The answers I have received over and over again have made me aware of two consistent themes, each indicative of a similar and very serious problem that explains why customers are identifying issues before monitoring tools. The themes I’m referring relate to the monitoring setup of Traditional Enterprises vs. Digital Enterprises.
Traditional Enterprises are Struggling with their Legacy MoM Investment
By “traditional,” I am referring to organizations that fall into the categories of Financial Services, MSPs, Manufacturing, Telcos and Federal. While these organizations are likely going through a serious digital transformation as you read this, their IT environments, processes, and many of their tools are very much “traditional.” Their tooling is heavily reflective of investments that they made in the 1990s and early 2000s, when MoM (Manager of Managers) vendors like IBM, HP, BMC and CA put together enterprise-class suites for operations management.
While these suites were brilliant conceptually, they were really just hodgepodges of acquired tools with varying levels of integration. They perform rudimentary noise reduction and event correlation, but through a rules-based approach—meaning that you need to anticipate an issue and model it before it occurs.
You can imagine the serious implications of that requirement. Furthermore, they were difficult to configure and manage, and only after millions of dollars and hundreds of man hours did they work at all.
Fast forward to 2016 and these tools are still the core IT management layer at traditional enterprises, due to the genius business strategy known as “vendor lock-in.” Traditional enterprises have realized the value in next-generation monitoring tools and have invested heavily in obtaining a best-of-breed toolset to improve their service quality.
However, when I speak with IT Operations teams from these companies, it’s clear that they aren’t fully leveraging their diverse toolset. I’ve found that they are sending only a fraction of their event stream to their IBM Netcool or CA Spectrum, for example, for operations teams to view.
They are forced to consume such a small fraction of their events and limit their visibility because their legacy MoMs are unable to:
2) Integrate with new tools (No standard APIs)
3) Automatically adjust to infrastructure changes (Need to manually build/maintain rules)
Here’s an example of the monitoring landscape from a large traditional enterprise that was shared with me recently. They had 40+ monitoring tools, 1000+ applications, and were generating ~200,000 events/day. They were using CA Spectrum as their Manager of Managers and, due to scalability and integration restrictions, Spectrum was only ingesting ~30,000 events/day from just Splunk, Keynote, and Solarwinds.
This is only a 15% event coverage!
So how was this setup working for them? Well, they shared that 7-9% of their incidents were detected by their tools, and the rest were detected by customers. You can imagine the issues they faced with SLA violations, revenue loss, and increasing IT costs.
For a deeper dive on this subject and how to fix it, I’d recommend reading Fed Up with Legacy Monitoring Tools? It’s Time for Composable IT Monitoring, by Jason Bloomberg of Intellyx.
Digital Enterprises are Struggling with the Lack of a Management Layer
‘Digital’ to me means SaaS, Media, eCommerce, Retail, Online, ISVs, etc. The key difference here is that most of these companies were born digital and never invested in the legacy MoM solutions in the first place.
This quality makes them far more flexible and agile by nature, however there are serious drawbacks.
The digital enterprises I have spoken with have typically built out a substantial best-of-breed monitoring ecosystem to meet their unique needs, however they have no management layer to tie it all together and are beginning to feel the pain of:
- No deduplication capabilities. If you don’t think that deduplication is important, realize that even a mere 25% reduction in event volume (very minor compared to 99% reduction Moogsoft is able to offer) means that operations teams need to look at 25% less stuff. That’s a huge workload reduction and productivity boost!
- No correlation capabilities. Without a tool to automatically tell you that two or more events are actually related to the same issue, you’re going to have different teams independently investigating the same issues, and thus wasting precious time.
Despite that fact that legacy tools from IBM, CA, BMC and HP require heavy manual effort to offer these capabilities and don’t compare to the value that can be provided by modern tools like Moogsoft, these Digital Enterprises without a management layer are nonetheless missing out on these benefits altogether.
From what I’ve seen, email is the go-to management console. Below is the monitoring landscape that was shared with me from a digital enterprise that was using email as its central event management console. Due to the small size of their support team, they decided to send only 500 events/day from their SiteConfidence Synthetics tool, and ignore everything else.
This digital enterprise was generating ~40,000 events/day meaning that they were only looking at ~1%. The support team would look through the most critical events, manually de-duplicate and correlate events/alerts together, and then dig into other tools when appropriate. It’s an incredibly manual and inefficient process, not to mention the lack of visibility they have across their IT environment.
When I asked how their monitoring was working for them, they shared that ‘most’ incidents were detected by customers as opposed to their tools.
In certain cases, a homegrown management solution has been built but rarely have I heard the people behind the organization say that they had a strong grasp over their service quality (Netflix is an exception).
Modern Event Management Tools Enable Composable Monitoring
Modern event management tools, like Moogsoft AIOps, enable large enterprises and service providers to plug in their best-of-breed monitoring ecosystem and consume billions of events per day to provide complete visibility across their IT environment. Furthermore, they can apply natural-language-processing and machine-learning algorithms to automatically reduce noise and correlate events from across the application, network, and infrastructure layer.
Essentially, having a modern event management tool allows you to leverage your monitoring ecosystem as a cohesive unit and address a reduced and enriched subset of data to ensure optimal service quality and performance.
About the author Sahil Khanna
Sahil Khanna is a Sr. Product Marketing Manager at Moogsoft, where he focuses on the emergence of Algorithmic IT Operations. In his free time, Sahil enjoys banging on drums and participating in high-stakes bets.