Machine learning is figuring out root causes of IT problems. It’s learning, with human input, which alerts and remediation steps matter.
AIOps uses machine learning and other bulk data computational analysis tools to identify what has gone wrong in IT, suggest remediation actions, and learn from the choices administrators make to resolve problems. While AIOps does involve turning over some of what administrators do to machines, it predominantly involves administrators working closely with machines to boss around other machines.
Luckily, you don’t have to learn to think like a machine to make real-world use of AIOps.
The basic principles behind AIOps shouldn’t be shocking to anyone. In many areas of modern life humans turn to computers for advice, and — spoiler alert — it’s already harder than you think to separate human and machine into neat boxes.
How many of us would struggle to answer basic questions, recall important facts, navigate our vehicles to an unknown destination, so on and so forth, without a computer? We have, both individually and as a culture, become utterly dependant on Google searches, GPS, and even digital note taking.
Computers have become an adjunct to the human brain. Nowadays we are expected to learn so much, so fast, that very few of us can store all the day’s knowledge in the pathetically low six or seven hours of sleep that we get. Instead of complete memorization of the overwhelming amount of knowledge we all require to do our jobs, we end up remembering pointers to where that information lives.
AIOps is about both human and machine doing only the parts that they are good at — with the caveat that the machine is slowly, but persistently, getting better at everything as it learns from humans.
The information, of course, lives in a computer somewhere. Our smartphones, our desktops, Wikipedia, and even the corporate knowledge base all function as extensions of our own memories. In so many cases, we don’t feel we need to actually know a given piece information. I just have to know that the information exists, and have a rough idea of what to punch into Google, or say to Siri or Alexa, to find it.
Eventually we taught computers to automatically analyze this data, analyze how we react to that data, and then recommend courses of action. This isn’t revolution; it’s evolution. We have been headed down this path for some time.
Computers Aren’t the Enemy
Computers aren’t going to be able to do everything in the data center any time soon. For now, even with all the IT automation you can think of, there is a need for humans in the data center for the foreseeable future.
Humans solve problems that are currently out of reach of AI solutions. AI automates tedium and mundanity that humans are terrible at motivating themselves to do.
Humans are terrible at documenting their solutions sensibly, or remembering how they solved a similar problem, either yesterday or 11 months ago. On the other hand, a computer can see the bigger picture: how do IT solutions come together to create a solution that the business actually needs. A computer can analyze usage patterns and predict how many servers with what capacity will be needed and when. Humans are required to actually take that information and combine it with other considerations – business, technical, and political – in order to make comprehensive plans for the future.
Humans aren’t great at prioritizing under stress. Humans get emotional, bored, and can sometimes get caught into bizarre little loops chasing their own tails. It’s here that computers can really help.
Machines are great at documentation as long as you feed them a sensible algorithm to start with. They don’t forget because they are tired, or hungry — or that the bike is on fire and they’re in hell with only a hand-held fire extinguisher.
Practical Computer Advantage
For a practical example of why AIOps is useful, consider Security Information and Event Monitoring (SIEM). One of the purposes of SIEM solutions is to hoover up all logging, event and performance data from every workload and device possible. Hypothetically, analysis could then be performed on this data in order to produce alerts.
SIEM solutions for a long time used human-defined alerting systems that were largely based on thresholding. These solutions were — and are — a pain because they alert us with every single irrelevant abnormality.
Humans must extract usable signal from the noise, which can become overwhelming when a real incident happens. The most important part of responding to events is determining the root cause. That’s difficult when the alerting system in use can’t correlate events, prioritize them, or otherwise make sense of the alerts for sleep-deprived systems administrators. A single outage can cause a lot of alerts.
Thanks to machine learning, modern solutions are much better at dealing with alerts. These AIOps solutions are learning from humans which alerts matter, and what remediation steps are taken. And because AIOps is about machines watching humans to learn, the infinite attention span of machines comes in handy.
Machines become smarter the more data we feed them. Every time anyone using an AIOps solution solves a problem, the system learns. Any time someone dismisses an alert, the system learns. All actions from all administrators help the solution learn, and this makes the advice it provides more accurate, more of the time.
We’re not at the machines replacing humans stage. They’re just not good enough yet, and they won’t be for a while. What machines can do is automate.
Imagine being able to execute a remediation to an event with a single command or button click. Every time there’s an event, one or more options are offered by the computer for resolving the problem. If the solution you think is the right solution happens to be one of the “push button, receive bacon” pre-canned, and automated solutions, then you save time. If none of the recommended solutions solve the problem, then you still have to do it manually, but you haven’t lost anything by taking a couple of seconds to read the computer’s suggestions.
AIOps is the evolution of traditional SIEM systems into something that has finally gotten to the point where it may be a useful piece of a machine-human partnership. AIOps is about both human and machine doing only the parts that they are good at — with the caveat that the machine is slowly, but persistently, getting better at everything as it learns from humans.
About the author
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He is cofounder of eGeek Consulting Ltd. and splits his time between systems administration, consulting, and technology writing. As a consultant he helps Silicon Valley startups better understand systems administrators.