Automotive Digital Enterprise Attains Continuous Service Assurance using AIOps from Moogsoft
The organization reduced alerts by 99% and vastly increased the productivity of its IT operators
This large automotive digital company gets tens of millions of monthly visits to its web properties, so consistently delivering new features and high service quality is crucial for the business – but it was struggling to do so.
“Moogsoft was the only solution that could truly correlate our events across multiple tools and event sources out-of-the-box.”
– Operations Center
The organization was monitoring and managing their applications and infrastructure
using 17 disparate tools. Its previous event manager had been IBM Netcool, but its administration became difficult and it required too much training.
“We just didn’t have the skillset or budget to spend on IBM contractors, so we dropped Netcool,” said the senior manager of the operations center.
With a small team, its Level 1 operators were overwhelmed with 6,000 emails per month, 1,000 of which were turned into ServiceNow tickets. About 66% of their tickets were closed without any action taken because they were false positives. They suffered two to three outages per week.
“Our process was broken. We needed better visibility across our tools, a reduction in the number of tickets generated, and a reduction in the overall effort and speed to detect and resolve incidents,” he said.
While they were looking at solutions, he concluded that “Moogsoft was the only solution that could truly correlate our events across multiple tools and event sources out-of-the-box.”
Today, Moogsoft ingests events feeds from 10 different tools to reduce noise and correlate events into actionable incidents.
In just the first few weeks, Moogsoft ingested 17,000 events and correlated them into 34 actionable incidents for Level 1 operators, delivering a 99.9% reduction in workload
and a 500x increase in operator productivity.
Moogsoft uses machine learning algorithms to automatically analyze, reduce, and correlate this customer’s alert feeds in real-time, meaning that Level 1 operators can be notified of anomalies in seconds, long before problems manifest into application-wide incidents in production.