Whether you’re dealing with network performance issues, failed application transactions, slow SQL database queries, poor storage utilization, or VM resource contention, Solarwinds has a solution for you. Solarwinds’ comprehensive collection of products can intelligently inform you about abnormalities and errors occurring across your production stack.
However, what are you supposed to do when your tools send off hundreds of alerts in response to an incident? Your organization can’t afford the time it takes to manually sift through each and every alert to distinguish between the signal and the noise. You can try to filter out alerts, but then you are narrowing your perspective of your infrastructure and raise the chance of missing valuable information.
Furthermore, does a storm of network alerts necessarily mean that the root cause of the issue lies in the network? What about alerts from your application, database, storage, etc… In today’s agile and virtualized world, it does not. IT incidents are incredibly complex and often times there is no single root-cause to unveil.
IT organizations need a way to leverage their Solarwinds and other vendor monitoring tools holistically to gain full situational awareness of their applications, networks and infrastructure.
Moogsoft turns Alerts into Actionable Situations
Leading enterprises and managed service providers are now using Moogsoft to maximize the value of their IT monitoring investments. By ingesting ALL of their IT operational data, structured or unstructured, Moogsoft uses natural language processing and various machine learning algorithms to perform real-time noise reduction, event correlation and anomaly detection. This often means taking tens of thousands of events down to a few hundred manageable events, while maintaining all context relating to a particular incident. *Some Moogsoft customers have attributed this capability alone to a 10x increase in operator productivity
Simultaneous to the noise reduction, Moogsoft uses unsupervised machine learning techniques to identify complex relationships between events and alerts and cluster them into ‘Situations’ in real-time. This correlation enables a transformation within your organization from alert-based to Situation-based incident management. This means that IT ops can now see incidents as they unfold in real-time and proactively address them to avoid any impact it may have to a business service.
The Situation Room
When Moogsoft clusters and correlates a group of related alerts into a Situation, a ‘Situation Room’ (virtual war room) is created. Based on the cluster of alerts, Moogsoft infers which domains are involved in any particular incident and automatically invites the appropriate domain experts to join the Situation Room. Within the Situation Room, each user has access to all Situation context. This includes all alerts clustered in the Situation, a discussion thread, a timeline visualizing how the incident unfolded, a knowledge base of similar Situations from the past, as well as access to all monitoring tools needed to determine root cause.
The discussion thread is where all participants can communicate and collaborate to resolve the incident. Furthermore, Moogsoft has native ChatOps capabilities directly within discussion threads so users don’t to have to constantly switch between multiple user interfaces.
As a basic example, let’s say that there is a Situation full of mostly application related alerts. Based on the cluster of alerts, it appears that there are multiple slow end user transactions being impacted for a given application. At first glance, this looks like an application specific issue. However, you then notice a few Solarwinds network alerts that are clustered as part of this Situation. These alerts indicate that one of the database severs connected to the application is experiencing high network bandwidth consumption. In addition, the Solarwinds database performance analyzer has thrown multiple alerts into the Situation for the database instance, highlighting high CPU and disk read activity for a given user session which relates to someone running a marketing report. Because all these alerts are clustered into the same Situation (and narrative), its straight forward for anyone in IT operations to understand what has happened without needing to troubleshoot the application or network. This level of situational awareness saves time, resource and money.
Using Moogsoft along with your Solarwinds monitoring tools allows operations teams to be situationally aware, and time efficient in how they detect and resolve IT incidents.
About the author Sahil Khanna
Sahil Khanna is a Sr. Product Marketing Manager at Moogsoft, where he focuses on the emergence of Algorithmic IT Operations. In his free time, Sahil enjoys banging on drums and participating in high-stakes bets.