Incident.MOOG Architecture

Next Generation Event Management Today.

Moogsoft has a modern software architecture that is open, scalable and flexible to meet the needs of today’s challenging and complex enterprise environments.

Architecture Overview

Incident.MOOG Architecture:

Incident.MOOG Architecture

A simplified view of the data flow through Incident.MOOG is depicted in the figure below. Within the CLEAN phase, Incident. MOOG applies sophisticated machine learning to solve the growing problem of event noise reduction.

Incident.MOOG’s algorithms are optimized to capture events as close to the event source as possible. The data ingestion system looks for three things: a source, a timestamp and a message. Incident.MOOG’s machine learning capabilities can pull out the important artifacts from those three fields without present rules or filters. Severity levels assigned by equipment manufacturers are ignored by default to make sure nothing important is missed; an event is processed if the system deems it is statistically significant.

Algorithms Architecture (Clean, Contextualize, Collaborate)

Next, Incident.MOOG de-duplicates the event stream, computes significance rankings, and rolls-up alerts on to high-performance, multi-path, real-time message bus.

The relationship between events and alerts is held in the Incident.MOOG datastore.

Incident.MOOG creates clusters of related alerts during the CONTEXTUALIZE phase. This is performed by Sigalisers, detailed below. Once a cluster becomes statistically significant, a Situation is created. There is a near one-to-one relationship between Situations and trouble tickets normally raised by an operator. In contrast, in a typical operations center, many alerts are manually cleared before they are escalated to the help desk.

The number of Situations on the Situation bus is greatly reduced compared to the number of alerts on the Alerts bus. Situations themselves are persisted in the Incident.MOOG datastore – they are significant to the fundamental purpose of the product, so they warrant long-term storage.

When Incident.MOOG creates a Situation, it has the ability to add more alerts into the Situation over time, giving operations a dynamic, real-time view into how an incident is unfolding across the IT ecosystem.

During the COLLABORATE phase, stakeholders are engaged to take action on Situations. Situation Rooms are the primary UI for appropriate stakeholders to work together to resolve the Situation, as detailed below.

Incident.MOOG allows you to view all of the events underpinning an alert, as every event stores a reference to the alert it is associated with. Incident.MOOG never deletes alerts because they may have been grouped into Situations.

Link Access Modules (LAMs)

LAMs ingest underlying agent data and output JSON objects – normalizing and enriching events fed into Incident.MOOG. Rather than defining each alert through rules, Incident.Moog defines feeds. This immediately reduces the amount of labor involved in deploying and maintaining Incident.MOOG.

A LAM receives a data source fed into Incident.MOOG. Event data feeds (using formats such as SNMP, syslog, log4j, IMAP, SMTP, etc.) may be generated by application software, cloud services, automation tools, application and system monitors, IT infrastructure, even customer sentiment on social media.

Incident.MOOG also supports a RESTful and SOAP interface. Each data source type has a corresponding LAM that is configured to do the basic parsing, translating, mapping, filtering and enriching of data. LAMs can read from any Unix file/socket descriptor so there is no limit to the kind of things from which you can source text data.

Incident.MOOG uses LAMs to ingest data from your Eco-System of Tools like:
– Application Performance Management (APM) tools: AppDynamics, New Relic, Compuware and others;
– Network Performance Monitoring (NPM) and Diagnostics tools: JDSU, SolarWinds, Riverbed, Fluke Networks and others;
– Event Managers: IBM Tivoli Netcool, BMC Event Manager, CA Spectrum, EMC Smarts, Microsoft System Center and others;
– Log files: Splunk, Sumologic, Logstash and others;
– DevOps tools: Chef, Puppet, Nagios and others.

Sigalisers

Sigalisers use machine learning to convert event and alert streams into Situations. Incident.MOOG supports several different types of Sigalisers; each one performs clustering based on a specific set of criteria.

The Time Sigaliser takes every occurrence of an event that is in an alert stream, and uses matrix factorization algorithms (pure unsupervised machine learning) to identify clusters of alerts that are temporally correlated, identifying underlying service outages or Situations. In other words, the Time Sigaliser spots unusual patterns in the timestamps of events which may indicate that these events are related.

The algorithms are run in semi real-time and can be triggered by a fixed polled time period.

The Linguistic Sigaliser detects linguistic relationships in events. It uses another type of unsupervised machine learning for grouping alerts according to the similarity of linguistic attributes.

The Topology Sigaliser uses unsupervised machine learning to cluster events based on connection proximity.
• Proximity is measured by network hops
• The Topology Sigaliser identifies events from a similar location as being potentially related

The Defined-Template Sigaliser allows Operations teams to create templates from a discovered Situation, which can then be used to compare against future Situations. Operations can then use these templates to either reject future situations as noise, or trigger specific remediation scripts/processes using MooBots.

The Machine-Learned-Feedback Sigaliser is able to learn what an Operations team and user did from a Situation previously and re-apply those actions via MooBots.

The Cookbook Sigaliser provides an option that gives you complete control over which alerts get clustered into Situations. It allows you to create Situations according to a pre-defined Recipe (streaming SQL filters trigger the application of selected algorithms to events).

The Cookbook Sigaliser gives you the power to create Situations in a fully deterministic fashion.
• You can include or exclude alerts from clusters using filtering criteria such as number of occurrences of an event
• You can partition alerts into Situations using textual similarity-based comparison
• You can also interrogate the topology database

It is designed for scenarios where you are confident of system behavior.

Once Sigalisers group the alert clusters into a Situation, a Situation Room is created in the Incident.MOOG database, and Operators are notified through the Situation Queue in the User Interface.

Situation Room

The Situation Room is a collaborative discussion where the right people are immediately assembled to solve the problem at hand.

Incident.MOOG creates a Situation Room for each Situation and then automatically invites the appropriate stakeholders into the Situation Room based on how their profiles of expertise match up with the content of the alert cluster pertaining to that Situation.

Situations can be grouped into a Story, which encapsulates the evolution of a Situation or a set of Situations overtime. Situations evolve by being merged or superseded through the algorithms or manually through the User Interface.

Grouping Situations as a Story allows you to unify all of your input for merged/superseded Situations in one place so you can see the relationship between Situations as they evolve. In this way, Incident.MOOG documents how you solved problems in the past, creating a contextual knowledge base to help you solve today’s problems faster.

Situation Room

MooBots

MooBots allow incident.MOOG to integrate, communicate and execute actions within your eco-system of monitoring and management tools. They are JavaScript programs that can be defined and invoked as part of a Situation workflow.

For example, a database server may throw events when disk space is running low. Using a MooBot it’s possible to wrap up and execute a cleanup script that can free up disk space on the server thus automating the remediation of the incident.

Real-Time Message Bus

The back-bone to Incident.MOOG architecture is our real-time message bus. This allows us to process, analyze and detect anomalies instantly from the event feeds using our natural language processing and machine learning Sigalizers.

Optimized for latency, reliability and throughput, this component allows incident.MOOG to scale in some of the largest enterprise and managed service provider environments.

Architecture Deployment Options

On-Premise

On-Premise

Saas Hosted

Saas Hosted

Private Cloud

Private Cloud

The Incident.MOOG architecture can be deployed as on-premise software in your data center, as SaaS via our MOOG Cloud, or in your own Private Cloud like Amazon EC2.