It’s Time for a Little Chat About ChatOps
Sahil Khanna | June 23, 2015

If you’re currently implementing DevOps concepts for your IT org, you’ve likely come across the growing concept of ChatOps.

If you’re currently implementing DevOps concepts for your IT org, you’ve likely come across the growing concept of ChatOps.

If you’re currently following (or implementing) DevOps concepts for your organization, you’ve likely come across the growing notion of ChatOps. For those unfamiliar, ChatOps is a term first coined by GitHub that refers to making tools accessible through chat clients. Most of the discussion has been focused around ChatOps as a tool for automating the deployment of code in a DevOps environment. At Moogsoft, however, we view ChatOps as something much broader – it’s also about simplifying the lives of all those involved in managing IT incidents, regardless of the level of DevOps adoption.

As explained in this FierceDevOps article, Enterprise IT teams still generally work in silos and situational awareness is limited because knowledge is scattered. At the same time, the daily volume of machine data (e.g. log entries, events, alerts, alarms) generated by modern IT enterprise environments is growing exponentially. So when something starts to go wrong, it takes a long time to separate the signal from the noise, as well as to quickly understand its sources. Furthermore, amongst the flurry of parallel activities to troubleshoot the incident, conversations and actions are not recorded, limiting the larger team in learning from the past, then needing to start at mile-marker zero when encountering a future but recurring incident.

We live in the Software Economy, where IT service quality directly impacts customer experience, putting immense pressure on Ops and DevOps teams to deliver for their Business. With the rise of virtualization, cloud abstraction layers and software-defined everything, IT failures are on the rise, more nuanced and complex than ever before. The common response has been to instrumentate and monitor the sh_t out of everything. While this plethora of tooling has certainly provided more good than bad, the number of display screens to scan across to try to gain some sense of situational awareness has become overwhelming and time consuming.

There has to be a better way.

What is Moogsoft Doing with ChatOps?

Moogsoft is pioneering the use of ChatOps in incident management and is determined to simplify the lives of Ops and DevOps professionals by helping to make their jobs easier. To radically improve incident remediation, Moogsoft AIOps software introduced the Situation Room, which is essentially a virtual war room where teams can view and share info on the cluster of related events and alerts that make up each service-affecting Situation.

When Situations are identified, relevant domain-experts are invited into a Situation Room to collaborate. In essence, Situation Rooms act as a chat client specifically tailored for the communication around incident management. As well, all parties participating in the Situation Room have detailed views to better understand the relationships of the clustered events and alerts within the given Situation, a clear view of the processes and sub-systems impacted, a detailed archive of all discussions and actions taken, and a suggested list of past situations that AIOps automatically analyzes as similar.

Moogsoft AIOps’ Situation Room also includes a Tools Workbench, whereby practically any third party diagnostic or support tool can be commandeered to help centralize remediation efforts – all under a single pane of glass. Now within the Situation Room’s chat client, Moogsoft has introduced a ChatOps command line utility that enables operators to interact with the various tools, all from one single screen (i.e. within the chat client of a Situation Room).

So instead of having to leave the Situation Room to access diagnosis/support tools and resolution scripts, for example, operators can now issue those same commands in the chat client, as if they were interfacing with those tools directly. When added up across the full duration of a service failure lifecycle, the time savings are very significant. Moreover, communications and actions are documented together and archived within each Situation Room for later review. Again, this saves time, facilitates training, and improves process improvement when combined with incident post-mortem reviews.

As a very simple example to illustrate the usefulness of this new feature set, let’s say that you are invited into a Situation to investigate an incident involving failed data storage. In the discussion, someone points to an alert from the storage array that reads, “Disk Space Full”. Within the same chat discussion, you decide to issue the command “@moog ls /tmp” to pull in the content of the directory. You then see a very large file that isn’t needed, so you issue the command “@moog rm-f filename” to remove that large file and the results of the command are returned. You then message to the rest of the shareholders that the incident has been resolved and all the communications and actions related to this Situation are captured and archived within the chat log of this now closed Situation.

The use cases for AIOps’ ChatOps capability are endless, and the level of sophistication of what actions to run through the Situation Rooms is up to the Ops and DevOps teams.

Lessons Learned

Overall, AIOps’ ChatOps capability makes the lives of Ops and DevOps teams easier for three basic reasons.

First, most people don’t like the tedium of switching between different tools, UIs and menus throughout the day. Adding a ChatOps command line utility saves time to engage with whichever tools you need, and simplifies tasks by requiring just a single interface to issue any command.

Second, ChatOps reduces human errors by enabling automation of mundane tasks. Although reducing the number of clicks and the amount of typing may appear minor, this simplification reduces mistakes and a productivity improvement will be immediate.

Third, ChatOps logs communications and actions in a single interface, helping teams to share knowledge, techniques and best practices easier than ever before. In any Ops or DevOps role, there is a steep learning curve that is traditionally addressed by sitting beside a peer and observing their work. Now Situation Rooms and closed Situations can be used as a valuable training tool.

Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps Platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.
See Related Posts by Topic:

About the author

Sahil Khanna

Sahil Khanna is a Sr. Product Marketing Manager at Moogsoft, where he focuses on the emergence of Algorithmic IT Operations. In his free time, Sahil enjoys banging on drums and participating in high-stakes bets.

All Posts by Sahil Khanna

Moogsoft Resources

August 4, 2020

Telemetry Everywhere: Observability in the DevOps Cosmos

July 22, 2020

What’s Observability with AIOps? Check Out Our New Book, Webinars and Infographic

July 21, 2020

Why Observability Matters to Site Reliability Engineers

June 29, 2020

Moogsoft Express Helps DevOps and SRE Teams Develop More and Operate Less