The Role of AIOps in Troubleshooting Microservices
Will Cappelli | August 20, 2019

AIOps can help in troubleshooting microservices performance, while microservices underscore the necessity of AIOps.

AIOps can help in troubleshooting microservices performance, while microservices underscore the necessity of AIOps.

Are Microservices a Legitimate IT Trend?

Historically, IT system components were physically installed and maintained for years. Today computing resources may be spun up and down at will, only existing for a very short period of time – on virtual machines, clouds, containers and increasingly, serverless environments. Many workloads may only have a lifespan of microseconds or minutes.

Undoubtedly microservices are a part of this legitimate trend. They are a specific example of how IT systems are being made more modular and independent. As a consequence of modularization, IT is miniaturizing components – in essence, building applications with less code and less functionality. So if we look at the bigger picture, the deployment of microservices is very much in line with a wider trend in IT.

However, IT system complexity is complicating the process of troubleshooting microservices should performance issues or critical incidents arise.

Don’t Microservices Lead to Greater Complexity?

The short answer is “Yes”, but there are a number of reasons why enterprises have moved to a microservices environment. Ultimately it’s about trading simple monolithic applications to gain more agility and flexibility in IT development and infrastructure.

To explain, let’s first examine why IT is driving towards modularity and microservices in the first place. It’s because the more modular the system is, the more easily DevOps teams can change the system to evolving business requirements. Ultimately, a team  needs only make local changes, as required, without having to worry too much about what’s occurring in the rest of the system. From a performance and execution perspective, the fact that components are modular makes it is easier to fit them into all kinds of architectures.

For example, let’s say your business decides to move a lot of infrastructure to the cloud. You’ll have a lot more freedom when deciding which components remain in-house and which are taken to the cloud, or how you want to distribute those components over various cloud architectures. In managing a relatively monolithic system, on the other hand, you’ll have far more constraints.

However, there is no free lunch. The more agility you build into the system, the more complex it becomes. The easier you make it to develop the system, change the system or distribute it architecturally, the more complex it becomes.

“Complexity” in this context means something quite precise. It means an increase in the entropy of the system design in a very real sense. One strict definition of “entropy” is a lack of predictability.

What Are the Consequences of IT System Entropy?

The move to modularization increases the entropy of the IT system. In a high entropy system, every data point contains a lot of information. In a low entropy system, many of the data points give you very little information. So greater agility has come at the cost of easy maintenance. The work of managing IT systems has become much more complex.

With a system built out of a few monolithic parts, it’s possible to infer the state of the system as a whole from a just few vantage points. With a complex system comprised of lots of independent parts working in sync, but loosely coupled together, it becomes harder to predict the state of the system from a few snapshots. IT has to monitor almost all of the components to be able to see what’s happening end-to-end and acquire an accurate picture of system state.

This has lead to a mismatch of monitoring tools to system architectures. Many of the traditional IT monitoring tools pre-supposed a low entropy world, so they are not equipped to deal with high entropy systems. Monitoring and troubleshooting microservices is therefore also complicated.

Without a doubt, there is a lot of complexity within a microservices architecture. Nobody claims that microservices make things simpler for development and operations teams. However, their inherent complexity is manageable.

What Role Does AIOps Play?

Today’s enterprise IT employs big data platforms to gather all data points. This is fine from a storage standpoint. Unfortunately, many enterprises think that’s the end of the story. “We’ve got all the data and can access the data, our job is done.”

Of course all they have really done is assemble a bunch of data into a big haystack. They still need to start looking for the needle. This is where AIOps can act as a very effective magnet.

The basic truth underlying AIOps is that there are patterns and events which disrupt the normal end-to-end behavior of an IT system. Because of the complexity and high entropy of today’s system, being able to spot those patterns and then analyze them simply exceeds the capability of human operators.

Even if we could, we would still need to figure out the root cause of any disruption before being able to fix whatever is ailing the system. There may be a mathematical curve to describe what’s going on, but it’s so complex that the human brain isn’t able to come up with the equation to make sense of it. Hence it is very difficult for us as humans to figure out how to deal with it.

AIOps enables enterprises to work with IT data that is being collected in large databases, to identify that a curve exists, and then to come up with the equation that describes the curve. AIOps processes data with the capacity to see these patterns. It then provides an analytical solution that human operators can use to solve IT problems quickly and efficiently.

Can Troubleshooting Microservices be Handled by AIOps?

Of course, and here’s how…

With microservices, things work a lot better if you have automated orchestration. The configuration of any orchestration engine should be in response to a specific business or technical requirement. Ideally, AIOps technology allows IT to rapidly identify problems, do some analysis, come up with solutions, and then feed them to the orchestration engine. AIOps operates in support of the same requirements as microservice orchestration.

In a high entropy IT environment, applications are running and orchestration engines are manipulating the stack while applications are executing. There is a lot of activity, which in turn results in massive complexity. An AIOps solution needs to view the IT environment in its entirety, which includes the impact of the orchestration engine. In this context, troubleshooting microservices issues simply becomes another function fulfilled by AIOps.

How Might Microservices Help AIOps Evolve?

Microservices comprise a whole series of components that have all sorts of complex changing connections. A mystery still to unravel is the path of causality from one microservice to another, when something goes wrong. What are the connections between microservices? To power effective analysis when troubleshooting microservices, AIOps makes heavy demands on topology or graphs to understand causality.

We will see topological, graph-based analytics become one of the central pieces of AIOps. Today’s AIOps takes into account topological analysis, and vendors like Moogsoft have developed topology algorithms, but more work needs to be done. In the future, topology will move to center stage in order to cope with the particular kind of complexity that microservices bring to the table.

Microservices underscore the necessity of deploying AIOps. In turn, AIOps can help in troubleshooting microservices to assure peak performance. This symbiosis will help microservices transition from leading edge technology to mainstream solution. For the broader market, microservices will help legitimize AIOps.

Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps Platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.
See Related Posts by Topic:

About the author


Will Cappelli

Will studied math and philosophy at university, has been involved in the IT industry for over 30 years, and for most of his professional life has focused on both AI and IT operations management technology and practises. As an analyst at Gartner he is widely credited for having been the first to define the AIOps market before joining Moogsoft as Field CTO. In his spare time, he dabbles in ancient languages.

All Posts by Will Cappelli

Moogsoft Resources

September 10, 2020

Using Observability to Inspect and Adapt CI/CD Pipelines

September 3, 2020

Fiserv Eliminates Ticket Overload with AIOps

September 1, 2020

Your Burning Questions about AIOps and Observability Answered

August 26, 2020

How Value Stream Management Uses Observability to Optimize Flow