When evaluating the financial benefits of an AIOps platform, it’s essential to look beyond its ability to reduce costs
Taking economics into account
Most enterprises consider economics when deciding which AIOps platform to purchase. Often, their conception of economics is narrow, reduced to the resolution of three issues: 1) the cost of the technology; 2) its ability to replace human labor; and 3) its ability to displace deployed products and, hence, defray future maintenance and subscription charges. In other words, AIOps economics becomes almost entirely a matter of cost. The benefit side – both direct benefits and the technology’s future impact on enhancing flexibility and reducing risk – is ignored.
The failure to address ‘Total Economic Impact’
A holistic financial evaluation of a technology that includes not only costs but also benefits yields what Forrester Research calls Total Economic Impact. Why don’t most enterprises make this assessment? Two reasons:
- First, up until the financial crisis of 2007-2008, IT was, for most businesses, overwhelmingly concerned with back office operations. It was impossible to correlate IT system events with revenue-generating business events. As a consequence, IT was, like most capital and operational expenditure associated with internal business operations, treated purely in terms of cost.
- Second, even in those few cases where some correlation between IT events and business events could be established, those IT components devoted to managing other IT systems lay deep in the ‘capital structure’ of the system delivering business value, i.e., these components did not contribute business value directly. Instead, they supported other components which supported other components which ultimately delivered business value.
As it turns out, a methodology – Real Options Pricing – had evolved which could model the value contributed by ‘deep capital structure’ investments. Unfortunately, the mathematics involved was complex and ill-understood, leading to widespread skepticism about these models. (Interestingly, the models often showed that most reasonable purchases in systems management technology yielded massive returns, which only made people more skeptical about them.)
Economic assessment must begin now
At present, the situation has changed dramatically. With the trend towards digitalization – a trend deepened and accelerated by the impact of the COVID19 pandemic – the overwhelming majority of all revenue-generating business events correspond directly to IT system state changes. IT is no longer primarily about providing support to internal business processes. It has become identical with the business. Recent research suggests IT and its related processes generate the lion’s share of the GDP of most developing and industrialized economies. In this scenario, systems management technology, including AIOps, is still not a direct generator of business value — but it lies much closer to the surface. It’s a component that supports the components that, in most cases, generate business value. Consequently, assessing the Real Options Pricing value of an AIOps investment is no longer the intricate mathematical task it was even 10 years ago.
In what follows, I won’t delve into the mathematical task, but I’ll instead lay out how an AIOps technology investment delivers value in the current business environment. In a future note, I will show how these intuitive steps can be translated into a quantitative model.
Thinking through the economics of discovery and resolution
The first step in assessing an AIOps investment’s value is acknowledging that certain events taking place within an IT system can lead to other events that harm service quality or trigger an outage. Note that, until the chain of events actually causes a problem, the business feels no actual negative economic impact — only potential costs.
In general, monitoring technology works in one of two ways. It either watches a subset of all IT system events, reporting whether the observed behavior is good or bad; or the technology detects and reports service degradations. With the first approach, costs grow linearly with the duration of the monitoring process even if the system is well-behaved. With the second approach, once an episode of service quality decline or outage begins, a modern digital business begins to lose revenue. Furthermore, the rate of revenue loss increases exponentially, in both cases, with time. For the monitoring style that reports actual service quality declines or outages, the trick is to ensure that the alert gets to the right people as quickly as possible and that they have the tools to resolve the issue.
When an AIOps investment’s economics are analyzed, the focus is usually on the technology’s ability to reduce MTTD (mean time to detection) or, if some automation is involved, MTTR (mean time to resolution.) The idea here is that if a tool shortens the time to discover or to resolve an issue, a quantitative benefit can be shown.
It is, however, incorrect to look at discovery and resolution as a single integrated process. In fact, discovery is composed of three distinct processes of determining: 1) whether an incident is underway; 2) the nature of the incident; and 3) the causes of the incident. Resolution adds two additional processes: communicating the incident’s nature and causes to the appropriate response agents; and the execution of the response. From an economic perspective, it’s critical to keep the respective lengths of time taken by these processes independent of one another. Put another way, just because one can reduce the time to determine the nature of the incident doesn’t guarantee that one can reduce the time to determine, e.g., that an incident is underway. The only way to effectively throttle MTTD or MTTR is to address the latency associated with each of the five processes in succession.
Why averages are meaningless
Note also that averages in this domain are highly suspect. Modern IT systems are highly modular, distributed, dynamic, and, at a component level, ephemeral. As a result, system properties vary significantly from one moment to the next with no necessary periodicity. As a result, any average measures are likely to possess huge variances or, looked differently, are summaries of history as opposed to descriptors of underlying properties.
Now, AIOps technologies that address only one or two of the five processes outlined above will only be able to demonstrate a positive economic impact by including averages in that assessment over which the technologies have no direct control. For example, suppose an enterprise is assessing an AIOps tool that does a particularly good job at determining the nature of an incident whose occurrence has already been detected but does not touch any other four processes involved in discovery and resolution. Suppose further that a trial has demonstrated a reduction in MTTR. Unfortunately, given the high variance of latency measures associated with the other four processes, that trial reduction is, for all intents and purposes, worthless as a predictor of the future.
A real world example of financial benefits of an AIOps deployment
A large financial services institution cut its MTTR (mean time to resolution) by a whopping 85%, down to 100 minutes, and slashed its Level 1/2 tickets by 75%, its Level 3 tickets by 15%, and its Level 4 tickets by 50%. The financial benefits to the business beyond simple cost reduction: Tens of millions of dollars.
This was achieved via a multi-pronged strategy encompassing several key use cases, including:
- A dramatic improvement in the clustering of alerts around incidents. The company went from a very limited, inefficient process, to an AIOps-driven ingestion and correlation process that consolidated alerts into unique, contextually-rich incidents and a massive reduction of tickets.
- An integration with the ITSM / CMDB system. This drastically simplified and accelerated ticketing, leading to faster, more effective routing, prioritizing, handling and resolution of incidents.
- Automated knowledge capture and recycling. The company went from not being able to access historical knowledge on how prior incidents were resolved, to having the knowledge capture and recycling process totally automated. Now, operators are automatically notified of resolved past incidents that are similar to the one being worked on, and they have all the past resolution documentation at their fingertips, leading to exponentially faster MTTR.
In summary, then, the only way in which a genuine total economic impact may be assessed – let alone a positive impact be proven – is if the AIOps technology in question is capable of altering the latency and costs associated with all five of the processes that together constitute discover and resolution in a modern digital environment.
About the author
Will studied math and philosophy at university, has been involved in the IT industry for over 30 years, and for most of his professional life has focused on both AI and IT operations management technology and practises. As an analyst at Gartner he is widely credited for having been the first to define the AIOps market and has recently joined Moogsoft as CTO, EMEA and VP of Product Strategy. In his spare time, he dabbles in ancient languages.