Verizon Media Group: From Alert Fatigue to Actionable Operational Insights
Moogsoft helps Verizon Media Group distill millions of alerts every day into the situational insights that matter
With such popular names as Yahoo, HuffPost, TechCrunch, AOL, Tumblr, and MapQuest chances are high that you rely on a site from Verizon Media Group to communicate, be entertained, or get briefed on breaking news. Verizon Media, formerly Oath, is built on Verizon’s $4.4 billion and $4.5 billion respective acquisitions of AOL and Yahoo.
As would be expected, following the acquisitions of two substantial organizations, Verizon Media found the delivery of its hundreds of media services dependent on an extraordinarily complex and highly-heterogeneous technology environment. One that consisted of disparate legacy systems, cloud systems, and various types of infrastructures. Verizon Media sought ways to streamline.
Those efforts began with a substantial shift to Amazon Web Services, as well as public cloud services from Google and Microsoft. To streamline application development and management, Verizon Media also increasingly embraced microservices, continuous delivery, and DevOps. These moves helped teams to deliver application enhancements more rapidly, test without human intervention, and deliver their software with increased agility.
Still, throughout the transition, when it came to effective operations management, the IT operations team faced significant challenges they would need help to overcome.
What Moogsoft offers in terms of its technology goes far beyond what other vendors make available”
The Need to Move from Event Alert Overload to Situational Context
Verizon Media’s infrastructure and the applications it supports remained enormously interdependent and complex. The underlying application infrastructure and loosely coupled microservice-based applications mean a breakdown anywhere in the service-chain could kick-off thousands of alerts and cause multiple application or service failures. The environment is so complex that operations teams found their traditional operations management toolsets unable to consume the vast number of events and overwhelmed with alerts — they were unable to identify the root cause of potential system issues and service interruptions.
Consider this: the Verizon Media infrastructure and supporting systems that power the 424 media services generates roughly 2 million alerts a day. The team needed to be able to find the signal through all of that noise and identify the alerts and situations that could have a real service impacts on application availability and performance.
According to Devan Franchini, production operations software engineer at Verizon Media, operations teams would be overwhelmed with alerts, and not be able to see the full context of the events behind the alerts. “Engineers would get an alert and move to resolve the situation. They’d then find a host or some other asset was not available. They’d create an incident ticket, but that failure already had an incident ticket because it was part of a larger outage underway,” Franchini says.
This meant teams wasted an excessive amount of time trying to triage specific symptomatic incidents because they couldn’t see the entire situation clustered into a context that made sense. “People couldn’t see the entire scope of impact,” he adds.
There was also a broader business challenge: being able to see operational event context across the Verizon, Yahoo and AOL business units, especially with a portfolio of several services, such as email, that span across those business units.
You don’t have to worry about having to move to another platform because Moogsoft is constantly growing and improving. That’s also very important to us”
Getting to the Signal Needed to Proactively Detect Events Earlier, and Swiftly Fix Problems
There was only one IT operations platform Verizon found that could provide everything they needed: Moogsoft AIOps. Powered by purpose-built machine learning algorithms, Moogsoft AIOps is the pioneering AI platform for IT operations. Moogsoft AIOps reduces alert noise to the point that IT operations teams can see the actionable situations that need immediate attention and are the root cause of underlying problems. Moogsoft AIOps achieves this by removing the alerts that don’t matter and then correlating similar alerts into a clustered situation. The platform then provides a root cause suggestion and enables multiple teams to collaborate and more rapidly remediate incidents effectively. “What Moogsoft offers in terms of its technology goes far beyond what other vendors make available,” Franchini says.
Verizon Media utilized Moogsoft’s direct Datadog REST adapter integration, as well as integrating Moogsoft with ServiceNow, where the operations team receives ServiceNow webhook information and leverage API calls to automate certain functions, such as ticket association.
The Verizon Media Operations team deployed Moogsoft AIOps within both its production and test environments. “Our test environment has assisted us in fine-tuning our situation clustering logic before on-boarding it to production,” he says.
In the production environment, Moogsoft AIOps is helping the operations team to monitor all four hundred and twenty-four unique business services, as well as Verizon Media’s internal infrastructure. Moogsoft AIOps ingests two million daily raw events, using six monitoring agents for 60,168 individual data sources and distills those 2 million events down to 10,000 alerts, close to 4,000 situations, within Moogsoft. That is a 99% reduction in noise that is no longer hitting the IT operations teams.
Verizon Media’s IT operations team leverages Moogsoft AIOps to get a comprehensive view of services that span across the entire AOL and Yahoo portfolio of web services. “We’ve done everything we could to make sure that everyone is monitoring the same things. If there is an alert, it is handled through Moogsoft,” Franchini says.
To date, Moogsoft AIOps has helped Verizon Media Group avoid several costly outages. “Our operations center engineers have been able to identify and assist in remediation of financially impacting outages, saving us a great deal of money,” he says.
Whether it’s a business service that starts to falter, or an entire data center, Moogsoft AIOps helps to ensure that the appropriate teams for the appropriate services are notified and can quickly remedy the situation. “Moogsoft provides us the ability to see how situations will be clustered. How Moogsoft clusters situations is very helpful to us and saves us time. As we start incorporating more AI functionality, it will save us even more time as well as improve the dynamics we have with some of our supporting teams,” he says.
More than the Moogsoft AIOps platform, Franchini and the team appreciate the company behind its technology. “I love the nature of Moogsoft. They are willing to work very closely with us. And the fact that the company is so very flexible and constantly improving, and not stagnant like so many other vendors. “We don’t have to worry about having to move to another platform because Moogsoft is constantly growing and improving. That’s also very important to us,” he says.