Can you trust AIOps tools not to make mistakes? Is there an implementation standard for AIOps? What’s the risk of not adopting AIOps?
During the recent webinar “AIOps Predictions 2020”, attendees posed a variety of questions about AIOps, on topics like the fallibility of products, the technology’s scope, and its impact on ITIL 4.
Read on to learn how Moogsoft EMEA CTO Will Cappelli and 451 Research Senior Analyst Nancy Gohring advised audience members on these topics and more during the webinar’s Q&A session.
Do you worry that AI can make serious mistakes? Do you think human intelligence combined with AI will produce better outcomes?
GOHRING: Yes, there’s definitely room for error, and the good thing is I’m increasingly seeing tools that allow for human interaction before kicking off something, like an auto-remediation. That should be inserted as a step in the process — a human has to OK it — before something happens. That’s a really good way for humans to check whether the system is accurately predicting what the problem is and accurately choosing the appropriate resolution step.
CAPPELLI: Yes, AI systems can make errors. That raises the concern of being able to audit effectively what AI systems do. There are certain types of algorithms which are very popular today that are intractable — literally a human being can’t understand how they have arrived at their conclusions. As AIOps becomes more widely deployed, we’ll see a move away from these black-box algorithms, and towards other types of AI algorithms that are more transparent.
What drawbacks and risks will organizations face if they don’t adopt AIOps?
GOHRING: I wouldn’t say 100% of organizations definitely need AIOps. It depends on your situation, on the technologies you have in place, the challenges you’re trying to solve. If you’ve got a sophisticated, cloud-native app that’s complex and dynamic, and you don’t upgrade from a 10-year old monitoring system, you’re going to have terrible performance and an unreliable system, and it’s going to take you weeks to solve performance problems.
CAPPELLI: Your need for AIOps is positively correlated with the degree to which you’ve modernized and modularized your applications, and the degree to which your apps are supporting a genuinely digital business. It’s necessary if the behavioral complexity of your system is such that without the cognitive enhancement of the AI you can’t see what’s happening.
Can an AIOps tool do other correlations across systems or apps, and be the predictive analytics monitoring or healing tool?
CAPPELLI: There are three things that have been called out in this question and they are different functions. First of all, there’s the need to paint the picture of what your stack as an integrated whole is doing. You do need AI-like pattern discovery algorithms to take data from your network, from your infrastructure, from the cloud, from the app logic, from the end user experience, and weave that all together into a coherent whole.
Then you’ll need a different kind of AI to understand what patterns are unique to this moment, and what patterns are likely to recur over time and hence become predictive. So the predictive algorithms are related to and feed off of but are different from the correlational algorithms.
Finally, to do the remediation, it’s critical to be able to understand the causality if you’re going to then automate any kind of remediation. And that causal inference is yet again something different.
These different kinds of algorithms need to work together in a choreographed way.
Which standard implementation paths do you currently see in the market? How can an organization prepare before onboarding vendor products?
GOHRING: The most important step when modernizing your approach to monitoring/incident management is to accurately identify your challenges and weak spots by studying real data. For instance, you may think a particular part of the process is painful and too time-consuming, but once you invest in measuring your process, you may discover more important issues. Once you’ve identified your most significant challenges, you can begin to research which tools will solve those challenges. Because the above process will yield unique results, I think there really aren’t standard implementation paths that will work for everyone.
How do you see, over time, the expansion of open source communities within the AIOps space?
GOHRING: I anticipate continued and growing enthusiasm for open source in monitoring tools, but not necessarily in AI/ML technologies in this sector. The growing adoption of Kubernetes has driven interest in Prometheus, given the close integration, and that should continue. The OpenTelemetry project has also driven new interest in open source in monitoring. However, so far most applications of AI and ML to monitoring (and related) data have been developed in a relatively proprietary fashion by vendors. While the vendors may employ some widely used algorithms, they also often may develop their own and take unique approaches to applying them. This pattern will continue for the foreseeable future.
How do you prove that the data that the AIOps platform is ingesting is accurate?
CAPPELLI: The short answer is that you cannot. During the webinar, I argued that AI could, in fact, be defined as the sequential deployment of five distinct types of algorithms. The first one is data selection algorithms for eliminating noise and redundancy. It is that first type of algorithm which is required to take a noisy, problematic data set and transform it into something usable, or at least, to let you know you need to go back to the data well if you want to quench your analytic search. The sad truth is that many, if not most, AI platforms jump right to pattern discovery and deliver results based on noisy data.
Can you expand on how you see AIOps fitting into the ITIL 4 framework?
CAPPELLI: ITIL 4 has replaced segregated, centralized, and deterministic processes with flexible, holistic, problem solving communities. These communities, in order to function effectively, will require a holistic, continually evolving perspective on the digital infrastructure. It is precisely AIOps that is charged with delivering that perspective.
Can you describe what “automation skills” companies should be aware of?
GOHRING: I’m hearing from enterprises that say they lack expertise around developing automations — specifically for remediation — and this could involve both expertise related to particular automation tools as well as scripting. These are the skills I most often hear are in demand in operations organizations looking to better develop their automated processes.
Which monitoring tool do you consider to be of high quality to provide input for the ML algorithms?
CAPPELLI: There is no single monitoring tool. Instead, effective AIOps requires input from an array of unified performance monitoring platforms, time series databases, log management platforms, event management systems, and change management systems. Each of these ingestion technologies provides a distinct and essential perspective on the behaviors and misbehaviors of an enterprise digital infrastructure. The five algorithmic dimensions of AIOps can, together, synthesize them so faults are diagnosed and incidents anticipated.
For a summary of the webinar’s highlights, read the blog post “2020 Is the Year of AIOps”.