Arguably, the entire point of the whole IT industry is automation, and yet it can still easily go wrong. Why does that happen, and how can we avoid it?
There is no question that the main purpose of automation is to accelerate processes and make the results more consistent. All the way back to Gutenberg’s press or the spinning jenny, the idea was to produce more of something, do it faster, and make the end products predictable and interchangeable.
These goals continue to hold true all the way into the 21st century. However, there is a catch. If we revisit the definition above, we are talking about accelerating processes and making results consistent. As is often the case, there are a couple of hidden assumptions here.
First Ask: “What Are We Automating?”
The first assumption is that there is a process, that it is known, fixed, well-understood, and well-documented. If the process is new, jumping straight into automation may not be the best idea. A couple of hours of quality planning time in front of a whiteboard can save days or weeks of pain later on.
A potential consequence is that automation can also hide problems by simply addressing their symptoms, so that root causes go unfixed and fester in the background until they blow up into a really big problem.
As I have had occasion to write before, “Without that holistic view, automation is just plastering more bandaids onto a sucking chest wound faster and faster.”
A simplistic approach of “automate all the things” won’t cut it, and can even backfire if the automation gets triggered incorrectly: the wrong action, or the right action in the wrong place or time.
It’s important to automate the right things, at the right points. IT operations management (ITOM) teams need to be able to see important information in order to put together good strategic plans, and then start to automate them.
The discipline of IT operations analytics (ITOA) is all about gathering useful information and assembling it together into a complete picture of what is going on. A new generation of AI tools is helping ITOps professionals do that, even with the huge volumes of data generated by modern IT infrastructures.
Next Ask: “Are We Automating The Right Thing?”
The other assumption around automation is that the results of the process being automated are broadly similar — that is, there are few exceptions. Automated systems are great at stamping out large numbers of identical copies of a single template or blueprint. Where things get complicated is when the results are variable.
Telsa CEO Elon Musk found this out recently, and he directly blamed some of Tesla’s recent production delays on excessive and premature automation:
“Yes, excessive automation at Tesla was a mistake. To be precise, my mistake. Humans are underrated.”
Humans are able to apply common sense to edge cases, and make spur-of-the-moment decisions about how to deal with unexpected combinations of factors. Automated systems do not have these capabilities, and so when options are introduced into an automated process, not only must each possible option be tested, but so must each possible combination of those options.
This is not to say that there cannot be any options, but it’s a question of proportion. If the result is different every single time a process is run, it’s probably not a good candidate for automation.
On the other hand, if there are a few options, but the total amount of product is large, it is worthwhile putting the design work in up front to deal with that.
I discussed some of these topics in a recent webinar with Moogsoft partners Resolve Systems, developers of run-book automation software for incident response (among other things). In particular, we discussed how the growing complexity of IT drives the demand for more automation, but also how that automation can avoid burning out human IT specialists, or worse, driving them out of the company. The recording of our talk is available here.
The end state of all this analysis should be declarative, desired-state automation — that is, you think about what you want the end state to be, and declare that in minute detail.
If you struggle to do that, it’s a sign that you do not understand the process itself in sufficient detail.
The objective is not 100% automation, the fabled “lights-out datacenter.” There will always be exceptions, corner cases, and situations where the automation itself breaks down. This is where the human expertise comes in. The biggest goal of automation is to get humans out of the business of repetitive manual tasks. Their expertise is much better applied performing strategic analysis, and dealing with the edge cases that automation doesn’t quite cover. A subsidiary benefit is that, because they are working at this proactive level instead of being in constant fire-fighting reactive tactical mode, they are better able to identify what is really important and needs doing right away.
A simplistic approach of “automate all the things” won’t cut it, and can even backfire if the automation gets triggered incorrectly: the wrong action, or the right action in the wrong place or time. Who among us doesn’t have a story of a script running in production that was only supposed to run in dev or test — with hilarious / tragic consequences?
This is what AIOps is all about: applying AI techniques to routing information effectively, whether towards humans, or to automated processes. This way, humans can focus on what’s actually important — and also more fun and rewarding to work on — part of which is, yes, developing new automation. ITOps is also about programming — not that coding wasn’t always a part of sysadmin duties, of course!
Ultimately, these are all tools to facilitate the end goal of operations, which is to keep the business up and running, and performing at the level that end users expect. Everything else is just details.
About the author
Dominic Wellington is the Director of Strategic Architecture at Moogsoft. He has been involved in IT operations for a number of years, working in fields as diverse as SecOps, cloud computing, and data center automation.