AIOps for AWS CloudWatch
Itai Njanji | March 14, 2019

How AWS Customers Can Reduce Incidents and Increase Service Quality using Moogsoft AIOps

How AWS Customers Can Reduce Incidents and Increase Service Quality using Moogsoft AIOps

Customers migrating workloads to Amazon Web Services (AWS) can choose from an almost infinite variety of tools to monitor their infrastructure. All of these monitoring tools generate valuable insights from events and alarms from services such as Amazon CloudWatch and AWS Config. Customers can further drive more insights by applying machine learning techniques to this data. By using advanced machine learning techniques, customers can reduce operational incidents and increase their service quality.

When Traditional Ops Fail

Modern applications that are being re-platformed or re-architected on AWS require modern techniques to operate them. Traditionally, hiring support teams was directly proportional to the number of tickets generated (more or less). This outdated approach does not scale well with the distributed nature of the cloud and the rate of innovation which cloud computing enables. Simply put, the number of incidents occurring in AWS CloudWatch, as well as in all of the other tools that are already in use, can be very hard to predict, making the number of expected incidents a much less dependable metric for hiring and planning in IT Operations.

One traditional approach to managing the number of tickets is extensive manual curation of alarms. For instance, one common practice is to kill some alarms that users do not expect to need to receive on a regular ongoing basis. In this case, the risk is that important and rare alerts or insights might be missed because the condition in which they occur is unexpected. The machine learning techniques which are being gathered under the umbrella of Artificial Intelligence for IT Operations, or AIOps, can provide a new way to think about reducing the number of tickets and providing remediation advice to busy operators.

Automating anomaly detection in this way helps operations management teams separate signal from noise, surfacing significant events together with all of the context required to accelerate root cause analysis and incident resolution. Combining CloudWatch logs and metric data from existing monitoring tools and custom metrics developed in-house gives operation teams full visibility into the true extent of technical issues and their business impact. By analysing very large volumes of data, early warnings of incidents will be detected and routed to the right specialists, avoiding incidents and minimizing impact to end users.

What Happens to ITIL?

A question that often comes up at this point is, what happens to ITIL when AIOps is incorporated into wider IT Operations processes? There is no simple answer to this question, as customers implement ITIL differently. Generally speaking, AWS CloudWatch customers have found that utilizing AIOps will improve the quality of their tickets, making them more actionable, which in turns improves their ITIL processes – and users’ satisfaction with those processes. For example, a reduction in the volume of tickets means service desk members have more time to diagnose issues with AIOps insights, helping them achieve lower MTTR (Mean Time to Resolution). The time freed up can then be dedicated to more strategic work such as ITSM hygiene – perhaps making sure that the CMDB is up to date, or performing analysis of recurring issues and best practices.

Tweet Section

By using advanced machine learning techniques, customers can reduce operational incidents and increase their service quality.

What About DevOps?

IT is no longer just about Operations. New technical architectures and development methodologies are coming together to blur the distinction between the previously separate roles of Development and Ops, commonly called DevOps. AIOps helps teams working according to DevOps methodologies to deliver continuous service assurance as they accelerate their digital transformation drives.

The key ways in which AIOps can enhance DevOps and increase the return on companies’ investments are as follows:

  • Increasing CI/CD frequency: continuous assurance without the need for time-consuming, manual changes to infrastructure or extensive and intrusive instrumentation of applications
  • Improving service quality: automated early detection and diagnosis of issues ensures uptime and mitigates impact to the business
  • Reducing ticket volume: issues that require manual handling can be reduced by 40 percent on average, engaging the right teams automatically, and so reducing escalations

How Do I Start?

Operations leaders have to balance building their Operations Data Science capabilities and keeping the lights on for their end users. Moogsoft AIOps offers AWS CloudWatch customers an easy path to full operational AIOps capability, thanks to its Cloud Management Tools Competency certification. Integration between Moogsoft AIOps and other AWS tools such as CloudFormation helps to deliver a complete ML-enabled IT Ops toolchain to AWS customers, ensuring continuous assurance of applications and services. Moogsoft AIOps is available directly from the AWS marketplace for customers to evaluate and purchase.

Read More:

Disclaimer: This post is my own opinion, and not the opinion of Amazon Web Services or any organization I am associated with professionally. The goal of the article is to trigger intellectual and thought leadership ideas.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers and operators to instantly see everything, know what’s wrong and fix things faster.
See Related Posts by Topic:

About the author

Itai Njanji

Itai David Njanji is a Seattle based Solutions lead for AWS Professional Services, Operations Integrations practice. As a certified AWS Solutions architect, Itai leads the team in building consulting solutions for customers to operate in AWS including integration strategies with operations third party tools. In his free time, Itai enjoys being outdoors and staying active.

All Posts by Itai Njanji

Moogsoft Resources

June 15, 2021

Monthly Moo Update | May 2021

April 29, 2021

Q&A from the Moogsoft/Datadog Fireside Chat

March 8, 2021

Coffee Break Webinar Series: Intelligent Observability for DevOps

March 4, 2021

Chapter Two: In Which James Reduces Noise and Distraction and Fixes a Network Issue with Hybrid Cloud