Modern IT Systems Have Outgrown Traditional Monitoring
Will Cappelli | October 6, 2020

What’s needed is a combination of observability with AI

What’s needed is a combination of observability with AI

Legacy monitoring tools fall short for SRE teams and DevOps pros tasked with maintaining uptime of key applications in modern, cloud-based IT systems. To have visibility and control over these environments, these teams must collect and analyze more granular, underlying system information — observability data. This article explains why the only way for SRE teams and DevOps pros to extract the necessary insights from this data is through the application of AI capabilities.

Why traditional monitoring can’t help SRE teams

Modern cloud-based IT systems are highly modular, distributed, dynamic and ephemeral. As a result, they change too rapidly for traditional monitoring and event management technologies that impose pre-defined rules, models, and event record structures on data streams in order to illuminate system behaviour.

Systems must become observable

To get to the information fidelity required, SREs and DevOps practitioners have increasingly sought to clear away all of this predefined structure and go straight to the underlying data. Their general consensus is that systems must be made observable and that the important data streams to achieve this consist of logs, metrics, and traces. By directly accessing these streams, SREs and DevOps practitioners hope to be able to observe and analyze incidents and state changes, and take immediate remedial action should circumstances require it.

Observability data is low-level, redundant, and noisy

There is a big problem with this approach, however. Metrics, logs, and even traces are extremely primitive, and the information they provide about incidents and state changes is very low level, highly redundant, and extremely noisy. Furthermore, because systems change so rapidly, SREs and DevOps pros don’t have the time to cut through the noise and redundancy, and then build up the analytical insight required to interpret these data streams even when they are cleaned up.

Even when the data is cleaned up, it is not actionable

There is a further problem. Metrics and logs, in and of themselves, contain no causal information. Without causality, the data is not actionable. To make causal inferences based upon metric and log data sets would require a complex, multi-layered AI capability which no vendor has brought to the market to date. This is a key reason why traces are invoked as the third member of the trinity. They appear to provide the causal clues that logs and metrics lack. Unfortunately, traces, unlike logs and metrics, are ill-defined and would require significant intervention at the code level to even begin yielding up those causal clues.

Three levels of AI are required to deliver on the promise of Observability – there is no Observability without AI

In other words, the reasons that SREs and DevOps practitioners want to replace monitoring with observability systems are valid. Monitoring does not work in modern environments. However, observability alone will not do the job. It needs supplementation by an AI capability that will be able to:

  • cut through noise and redundancy
  • build up correlations out of low level data
  • infer causal relationships from the correlations it has established

Observability was originally defined in a Control Theory setting and meant the property of a system that generated self-descriptive data sufficient for determining the causal relationships among system’s states. It is precisely the application of AI that will make systems generating logs, metrics, and traces observable.

For DevOps and SRE pros, AI-based insights into observability data would offer them the ability to automate the detection, diagnosis and remediation of problems with the speed and accuracy required in their modern, cloud-based IT environments.

This vision for AI-driven observability is what Moogsoft delivers to help SRE teams and DevOps professionals move beyond legacy monitoring and practice smarter DevOps. By applying AI insight to observability data in a rapid-deployment, self-service model, our cloud native observability platform unlocks an unprecedented level of agility and the ability to improve DevOps processes both quickly and at your own pace. But don’t take our word for it. Try AI-driven observability yourself by signing up for a free trial of Moogsoft!

About the author


Will Cappelli

Will studied math and philosophy at university, has been involved in the IT industry for over 30 years, and for most of his professional life has focused on both AI and IT operations management technology and practises. As an analyst at Gartner he is widely credited for having been the first to define the AIOps market before joining Moogsoft as Field CTO. In his spare time, he dabbles in ancient languages.

All Posts by Will Cappelli

Moogsoft Resources

February 17, 2021

Q&A: Datadog Expands Monitoring Reach with Moogsoft Observability Cloud

February 11, 2021

A Day in the Life: Intelligent Observability at Work with DevOps

February 3, 2021

Actionable Insights – Faster Incident Resolution with Datadog and Moogsoft Observability Cloud

January 25, 2021

Achieving the Observability Imperative Requires AI