I recently had the opportunity to speak with a new member of the Moogsoft herd, a leading eCommerce platform for automotive trading. This company is one of the largest of its kind, boasting tens of millions of monthly visits to its web properties, so consistently delivering new features, and maintaining the highest quality of service and constant availability are absolutely crucial for the business to succeed.
I spoke with the Sr. Manager of their Operations Center, who shared some valuable insight on their transformation. In this post, I’ll share some of the challenges that this organization was facing just months ago, and how they were able to leverage Moogsoft’s Algorithmic IT Operations solution, Moogsoft AIOps, to improve their service quality and customer experience.
This organization’s IT operations team monitors and manages the applications and infrastructure using 17 disparate monitoring tools to provide visibility across their physical and virtual production stack. Their core tools include Nagios, Dynatrace APM, Dynatrace Synthetic, Keynote, Solarwinds, SCOM, Oracle OEM, New Relic, and others. This organization was using IBM Netcool as their event management system, but according their Operations Center Sr. Manager, “Netcool was too admin-heavy and required too much training. We just didn’t have the resources or the budget to spend on IBM contractors, so we dropped Netcool.”
With a small team operating across 17 different toolsets, Level 1 operators were completely overwhelmed with alerts storms and a lack of context. Those operators were tasked with manually viewing 6,000+ emails per month, 1,000 of which were turned into tickets by ServiceNow.
“66% of our tickets turned out to be false [closed without any action]. Furthermore, we were facing 2-3 outages per week.”
– Operations Center, Sr. Manager
In summary, this customer had too many toolsets to look across, too much data to manually analyze, and were overly reliant on admins to understand their environment and know exactly what should be sent and not be sent. The Operations Center Sr. Manager concluded that, “Our process was broken. We needed better visibility across our tools, a reduction in the number of tickets generated, and a reduction in the overall effort and speed to detect and resolve incidents.”
While she was looking at tools like ServiceNow Event Manager and BigPanda to solve this problem, she stated, “I didn’t evaluate either of them because Moogsoft was the only vendor that could truly correlate our events across multiple tools and event sources, out-of-the-box.”
The Moogsoft SolutionAs a part of the evaluation, this customer decided to send events from Solarwinds, Nagios, and Dynatrace to Incident.MOOG. Incident.MOOG was able to ingest the event feeds and reduce raw events to unique alerts by 93%. Furthermore, Moogsoft was able to correlate unique alerts into individual Situations, allowing operators to quickly understand the context of an incident, visualize how it unfolded, and collaborate with with team members through the Situation Room.
Today, this client is using Incident.MOOG to ingest events from Solarwinds, Nagios, Dynatrace APM, Dynatrace Synthetic (Gomez), Keynote, SCOM, Oracle OEM, New Relic, Pingdom and vSphere, reducing noise and correlating events into actionable Situations.
In just the first few weeks since deploying Incident.MOOG, the client was able to ingest 17,000 events and correlate them into 34 actionable Situations for Level-1 operations. That is a 99.9% reduction in workload and a 500x increase in operator productivity!
Hopefully this story sheds light on how to overcome the challenges of alerts storming and lack of visibility across multiple toolsets for those who are in that position right now. Many more to come!
About the author Sahil Khanna
Sahil Khanna is a Sr. Product Marketing Manager at Moogsoft, where he focuses on the emergence of Algorithmic IT Operations. In his free time, Sahil enjoys banging on drums and participating in high-stakes bets.