Attendees at a Moogsoft webinar posed probing questions about the combination of AIOps analysis with detailed observability data.
A fireside chat to discuss use cases and deployment tips for AIOps with observability generated a stream of compelling questions from attendees, which the Moogsoft hosts answered with depth and expertise.
Combining AIOps analysis with detailed observability data is key for DevOps and SRE teams to attain continuous service assurance, so Moogsoft just published a new ebook about this topic titled “Observability with AIOps For Dummies.”
During the webinar, hosts John Haley, Moogsoft Product Marketing VP, and Adam Frank, the book’s author and Moogsoft VP of Product & Design, explained how to unlock true operational visibility and eliminate CI/CD complexity by applying AI to events, metrics, traces and logs.
They outlined actionable tips for a successful deployment, including picking target apps and services; identifying data sources; and enriching your data. They also discussed use cases, including enhancing collaboration; streamlining incident management; and reducing costs.
Throughout the webinar, they fielded questions from the audience. Here’s an edited transcript of their answers.
Is AIOps a way for our IT environments to look like a self-healing system?
Adam Frank: Yes, there’s a lot of automation that will occur within all these different processes. The automation provides the context that you need to leverage to automatically optimize the environment. The context in effect starts to heal the environment as things occur, or even might occur, through forecasting and predicting what behaviors certain apps and services might take on, to then optimize those ahead of time. That way, you can continue to provide the experience and reliability your customers expect.
John Haley: There’s definitely a maturity level that customers go through, but I believe that many of our customers have adopted an AIOps methodology that over time they’ll get to more of a self-healing system as possible. That’s the goal that we, as a software vendor, are trying to achieve. It’s definitely top of mind for many folks.
Do you have a successful use case for observability?
Adam Frank: Yes, we’ve got customers that are having success across a breadth of use cases. One customer is looking at a lot of periodicity data. There are many daytime highs and nighttime lows within their data, from their concurrent users connecting to their platform. It’s key to be able to look into this data and understand what the normal behavior is.
If concurrent users drop off or aren’t as high as they should be, we generate an anomaly. That anomaly is correlated with the rest of the observability and monitoring data from their infrastructure and applications. These anomalies are an early indication — a key metric — that this customer looks at to understand what’s going on in the environment. They’ll get our anomalies before the infrastructure and applications start to generate alerts.
Another customer — a cloud-first company — monitors many ephemeral nodes and containers with logs and other tools. The metrics data from the infrastructure around EC2 instances and containers are leading indicators that provide them with warnings before the application starts to log or generate alerts.
Does Moogsoft provide observability, workflow management and automation in one unified platform for an all-in-one, self-healing solution capability?
Adam Frank: Observability has three main components: metrics, logs and traces. Metrics is the leading indicator of abnormalities and of current or future incidents. Moogsoft Express collects metrics data. But if you already collect it with another tool, you can send it to Express, which will learn the normal behavior of that particular metric and generate anomalies.
As we’re observing all of that data, we automatically adapt in order to form high and low thresholds around the metrics and to generate alerts. We also do automated workflows. If I assign an incident to myself, that automatically moves it to “in progress”. That’s a simple thing. But there are so many aspects that we do in the workflow — from events coming in to notifications with Slack or with PagerDuty, so being able to customize some of that outbound stuff as well, and everything in between. We continue to build more on top of what we have.
Can we have Moogsoft Express installed on-prem? We have a strong infosec policy in place and are concerned about the business data collected being on the cloud?
Adam Frank: Right now, Moogsoft Express is a cloud-native SaaS offering. We continue to work with our customers and different user bases with strong infosec policies, so things like private clouds are definitely options. We want to make sure we continue to satisfy different needs, so we continue to have those conversations.
John Haley: It’s also worth pointing out that the Moogsoft Express Collector, which is our agent to collect time-series metrics data, is installed at the source on the particular service, whether they’re on-prem or in the cloud.
Does your product provide out-of-the-box ML models for observability, so we’d just need to divide the data sources to be used in order to train the models? Or do we have to create ML models from scratch?
Adam Frank: We don’t expect you to create any ML models. With the metrics data, we ingest it and learn the normal operating behaviors, by keeping a sample size of the data and continuing to forecast what that data should be. That comes out of the box. Other aspects of our platform certainly create models, but we don’t expect you to be a data scientist. We make it very simple for you to click buttons or move knobs, and see what that’s going to look like in real-time or near real-time, to make those adjustments.
John Haley: The transparency and visibility we give users into the correlation patterns to show the results, is really important for us.
Which of the following algorithms is the best use within AIOps — supervised, unsupervised or semi-supervised?
Adam Frank: That all depends on the area being automated. We use many different techniques. There’s no one method that works for everything — probable root cause is different from anomaly detection, and those are different from similar incidents. We look at many different aspects of ML techniques to use, depending on the use case and the goal we’re trying to achieve.
John Haley: With our 50-plus patented algorithms in our AIOps platform, we use a combination of supervised and unsupervised ML depending on the objective we’re trying to achieve.
Are Moogsoft Express and Splunk’s SignalFx similar?
Adam Frank: There’s an aspect of similarity across different products out there. It’s a very competitive marketplace. With regards to SignalFx, there are similarities, as well as major differences. Moogsoft Express is a real-time platform that collects and analyzes data, produces anomalies, does correlation, and presents to you the information you need around any type of current or potential incident or business outage. That’s different from products that create a data lake and do some ML after the fact to try to find patterns.
Have you seen any cloud-specific services being leveraged to implement AIOps? For example, Azure provides Azure Monitor with Application Insights, which enables plugging in logs, metrics and so on from different sources, and surface that on a central monitoring dashboard. It also enables root cause analysis.
Adam Frank: Our AIOps platform is data agnostic, so whether you use Azure, AWS or Google Cloud, or any other cloud provider, including on-premises, it doesn’t matter to Moogsoft. We want to bring that data in and look at the underlying metadata, at the similarities, correlate that data for you, and determine the root cause, along with several other insights.
Is AIOps a danger for IT operators’ jobs?
Adam Frank: I don’t think so. Our platform isn’t designed to completely take over your entire operations workforce. We free the operations staff from manual tasks by automating a lot of them, so operators can focus on higher-value tasks, like improving the customer experience.
Is it possible to have performance issues with the Moogsoft Express Collector?
Adam Frank: The collector is very lightweight. We designed it with IoT in mind. We have it running on a number of very small devices, like the Raspberry Pi. It uses very little resources, in terms of things like CPU and memory.
Do you see AIOps and observability combining forces even further?
Adam Frank: What we’ve seen is the evolution from IT Ops into DevOps and SRE, likewise with the people who have evolved from monitoring to observability. Observability on its own has generated an explosion of data, which you need to observe in microservice architectures and different aspects of those applications. But you still need to monitor that data and AIOps isn’t a single “thing.” It’s about automating all these processes and data from beginning to end. So the ultimate destination isn’t necessarily observability but rather AIOps. You want to observe what your applications are doing, and should be doing, but you want AIOps to consume all that data. You’ll continue to see them walking hand in hand with a really nice relationship.
In an enterprise, legacy apps still exist, along with modern apps in the cloud. How does observability work across these discrete infrastructures and application technologies?
Adam Frank: Observability is about producing the data you need to understand what’s going on with that legacy or modern application in the cloud. As long as you’re producing that data, then you can collect it and start to visualize it. That’s what observability is. Throw some monitoring on top of that and you start to generate events and alerts when things deviate from the normal expected behavior. Throw AIOps on top of that to start to automate and correlate everything together, and that’s when you start to really see and drive the value we’re talking about. As long as you’ve got a method to look at that legacy application and understand its behavior, and you collect that data and visualize it, then you’re observing it.
Is Moogsoft supporting flow monitoring and APM?
Adam Frank: We certainly integrate with lots of APM tools, and their flow monitoring aspects, which are driving out a lot of valuable alerts, and that’s where we come in providing complementary value by correlating it all together — correlating the infrastructure data that maybe caused the impact, with the application data. If there’s any metrics data that’s a key indicator of your application performance, we’ll ingest that too.
Analysts say that in 2023 many organizations will use AIOps in large companies. Do you receive lots of solicitations from those organizations to set up AIOps today? Is the demand really increasing?
John Haley: Yes, we’re seeing increased demand. I agree that in 2023 many large organizations will be seeking out AIOps. Moogsoft is also looking at taking many of these AIOps capabilities and making them available not just to large companies but also to development teams that are more agile, looking at their DevOps toolchain. We’re also making AIOps available to small-and-medium size enterprises that don’t have large, complex infrastructures but still need to automate and drive efficiencies within their workflow. Moogsoft wants to bring AIOps everywhere in the marketplace.
Adam Frank: Yes, whether you’re big or small, there are certain use cases you’ll have that require some level of automation. AIOps is everywhere within the technologies we use, and there are use cases we fulfill throughout everything.