Getting behind the buzzwords: The true meanings of AI, machine learning, and deep learning, and understanding how they relate to each other.
Algorithmic IT Operations (AIOps) is a new category created by Gartner, primarily to deal with the challenges associated with operating the next-generation of infrastructure. AIOps is quickly making its way into enterprise initiatives — Gartner even estimates that half of all global enterprises will actively be using AIOps by 2020.
Two of the biggest buzzwords that have crossed from the world of computer science and technology startups to the mainstream media over recent years are “Machine Learning,” and “Artificial Intelligence” (AI). Throw in “Deep Learning,” and we’ve got the start of a great game of buzzword bingo.
The core appeal of AIOps is the “algorithmics.” This implies the use of machine learning to automate tasks and processes that have traditionally required human intervention. Real machine learning for IT incident Management is readily available today, however it does not exist in every vendor solution that claims AIOps.
In this upcoming series of blog posts, I will demystify machine learning in the content of IT Incident Management.
Part 1: Behind the Buzzwords
Two of the biggest buzzwords that have crossed from the world of computer science and technology startups to the mainstream media over recent years are “Machine Learning,” and “Artificial Intelligence” (AI). Throw in “Deep Learning,” and we’ve got the start of a great game of buzzword bingo. These terms are closely linked and are often used interchangeably, but they aren’t the same thing. So what’s the difference?
In many fields, definitions are not always as clear as we’d like them to be — they have fuzzy boundaries and the definition can change over time as our understanding of the field and the capabilities within that field develop. AI falls into that category. The relationship between AI, machine learning and deep learning is such that each is a specialisation within the other. AI covers the broadest range of technologies, machine learning is a set of technologies within AI, and deep learning is a specialisation within machine learning.
AI: More Artificial or Intelligent?
One of the most general definitions of AI, taken from the Merriam-Webster dictionary, is “The capability of a machine to imitate intelligent human behaviour.” The term “machine” is important, because AI does not have to be restricted to computers.
A truly AI-enabled machine requires multiple technologies from a wide range of subjects including areas such as speech recognition and Natural Language Processing, computer vision, robotics, sensor technologies, and of course one of our other buzzwords, “Machine Learning.” In many cases, machine learning is a tool used by these other technologies.
In its very earliest days, AI relied upon prescriptive expert systems to work out what actions to take, an “if this happens, then do that” approach. And while prescriptive expert systems still have a place in some sectors, their influence is much diminished, and that function has largely been replaced by machine learning. Most observers would agree that machine learning is the biggest single enabler for high-performance AI systems today.
A prime example of modern AI is autonomous vehicles. They rely heavily on many different technologies working in harmony, some of which rely heavily upon machine learning, and particularly those that allow the car to detect and understand its surroundings. The now common-place voice assistants such as Siri, Cortana, Alexa, etc. all employ a variety of technologies that allow them to “hear” a human voice, to understand which sounds correspond to which words and phrases, to infer meaning from the series of words it has heard, and to formulate an answer and respond accordingly — all systems that require multiple technologies including machine learning.
What is Machine Learning, Then?
So, machine learning is a field within computer science that has applications under the wider umbrella of AI. One of my preferred definitions is one quoted in Stanford University’s excellent machine learning course: “Machine learning is the science of getting computers to act without being explicitly programmed.” So rather than programming a system using an “if this, then that” approach, in the world of machine learning, the decisions that the system makes are derived from the data that has been presented to it. Some describe it as a “learn by example” approach, but there is more to it than that.
Machine learning is now so common in the world around us that there are countless applications where we may not even realise it plays a part. Automatic mail sorting and speed limit enforcement systems rely upon incredibly accurate implementations of what is known as “Optical Character Recognition” (OCR), i.e. identifying text in images — a technology that allows us to identify addresses on envelopes and parcels, or the license plates on a vehicle as it passes through a red light or travels too fast outside a school. OCR would not exist without machine learning (unfortunately speeding tickets still would).
The “did you mean” and “similar searches” functionality in search engines, as well as spam filters, facial recognition systems, and recommendation systems on e-commerce, video and music streaming services — the list is endless, and not all of the applications are of the headline-grabbing variety.
Supervised and Unsupervised
As we will cover in more detail in posts later in this series, machine learning contains many, many different fields, which brings us to two further additions to our collection of buzzwords — “Supervised Machine Learning” and “Unsupervised Machine Learning.” Although the names are similar, the underlying algorithms and their applications are very different. Unsupervised techniques are generally simpler and try to find patterns within a set of given observations, patterns that you didn’t know existed prior. Recommender systems rely heavily on these techniques.
In contrast, supervised learning is the “learn by example” approach. Supervised learning systems need to be given examples of what is “good” and what is “bad” — this email is spam, this email isn’t. In the field of OCR, the system would be provided with multiple images of different letters and told which letter that image represents. As a system is provided with more and more examples, it “learns” how to distinguish between a spam email and one that isn’t, it learns the different arrangements of pixels that can represent the same letters and numbers. The consequence being that when a new example is presented to the system, specifically an example it hasn’t seen before, it can then correctly identify whether or not the email is spam, or the address that the letter needs to go to, or the licence plate of the speeding car.
Within the field of supervised learning there are numerous techniques, one of which is a technique called “neural networks.” Neural networks are software systems that try to mimic, albeit very crudely, the way a human brain works. The concept of the neural network has been around for decades but it is only relatively recently that their true power has been realised. A neural network is made up of artificial neurons, with each neuron connected to other neurons. As different training examples are presented to the network (e.g. an image or an email) along with the expected output of the system (e.g. the letter in the image, or whether or not the email is spam), the network works out which neurons it needs to activate in order to achieve the desired output under different circumstances.
The network knows how to configure itself so the neurons that get activated when a spam email is presented to it will be different to a non-spam email, and the rest of the system can then make a decision on how to handle that email.
We now get to our final buzzword (for the time being at least) — “Deep Learning.” Deep learning is a very specific and phenomenally exciting area within neural networks.
The easiest way to think of a deep network is as a larger and more complex network with more complex and sophisticated interactions between the individual nodes. The term “layers” is often referenced in the area of neural networks, and astonishing results can be achieved with networks that have only a single layer. Deep learning employs multiple “layers” with complex interactions within each layer and between layers. Consequently the patterns it can identify, and the problems it can be applied to, are more complex as well.
Deep learning is at the leading edge of machine learning research, and some of the advances in it have resulted in technologies such as automatic translation, automatic caption generation for images, and automatic text generation (e.g. automatically generating text in the style of Shakespeare). And in the same way that machine learning is the main enabler of AI, deep learning, right now, is the main enabler of advances in machine learning.
Coming Up Next
In the next post, I will give an introduction to the different machine learning techniques, APIs and frameworks that are available today for IT Incident Management.
About the author
Rob Harper is Chief Scientist at Moogsoft. Previously, Rob was founder and CTO of Jabbit and has held development and architect positions at RiverSoft and Njini.