How AIOps can complement ITSM, and help operators stop chasing tickets and start focusing on what matters to the business
ITSM used to be the way to manage IT services, but there are more and more claims that, in the age of DevOps and Agile, ITSM is now obsolete and no longer fit for purpose.
There is some truth to this attitude. ITSM is based around some assumptions that no longer hold true. At least not in quite the same way that they used to. There is a built-in expectation that a single incident relates to a single event — one thing that went wrong. The thing that went wrong has an owner who’s responsible for it, so the incident can be assigned to that person to investigate the issue.
The Old Days of Ticketing
In the last few years, and even since the last major review of ITIL in 2011, infrastructures have been getting more complex, and are now changing with increasing speed. This means that it is no longer true (if it ever was) that a single event describes an entire incident. Incidents are no longer triggered by single failures, they are caused by multiple different issues occurring together in some unexpected combination.
“Incidents are no longer triggered by single failures, they are caused by multiple different issues occurring together in some unexpected combination… This, in turn, means that there is no single owner of the resulting ticket.”
This, in turn, means that there is no single owner of the resulting ticket. In fact, part of the problem is that multiple different incident tickets may be created and routed to different teams, who each receive only a single part of the puzzle. It can take a long time to figure out how these separate records relate to each other, what the overall impact is, and who is ultimately responsible for fixing the problem. An acronym I recently learned is MTTI, the Mean Time To Innocence, which describes how long it takes to determine whether something is a cause or a symptom.
I talked about many of these issues in a recent video interview with Claire Agutter of ITSM.tools:
All of these issues make for a bad experience, both for the IT professionals caught up in the “catch and dispatch” process of trying to deal with a constant torrent of incidents — only some of which are real and their responsibility — and for the users of the business services, which are the reason for all of this activity in the first place.
A New Approach to Tickets
New approaches have been emerging to adapt ITSM to these new realities. One of the most interesting is “swarming,” in which virtual teams come together on the fly across organizational and technological boundaries to deal with incidents that lie across all of their areas of responsibility.
AIOps is another part of the same process, using algorithms to filter the event storm, and build a “meta-ticket” that encapsulates all of the different aspects of a particular incident.
There is also the growing interest in two-tier models, with systems of engagement to complement the ITSM system of record. The system of engagement is more agile, flexible, and self-organizing, and is therefore better able to deal with rapidly evolving situations, while the system of record is where everything gets documented, and where other processes can pick up on that documentation.
It’s time to re-evaluate how ITSM can evolve so that it can continue to support business requirements in the future, moving beyond the notion of ticket-driven incident management with a single owner and a sequential process. By extending that model and complementing it with other tools and techniques, we can deliver improved quality of IT service, lower cost of IT Ops, and lower impact to Ops staff’s personal lives from being constantly on call for incidents that all too often turn out not to be their problem.
About the author
Dominic Wellington is the Director of Strategic Architecture at Moogsoft. He has been involved in IT operations for a number of years, working in fields as diverse as SecOps, cloud computing, and data center automation.