We have reached a point where artificial intelligence and machine learning are being successfully applied to automate traditionally manual tasks and processes in IT Operations. From anomaly detection to automated remediation, bleeding-edge algorithmics are now incorporated into readily available tools that allow organizations to streamline operations by liberating humans from time consuming and error prone processes.
What takes humans potentially hours to accomplish can be done in just seconds with machines, and with far better precision. This is being recognized by leading Fortune 500 enterprises, who are rapidly adopting these technologies, as well as leading analysts firms like Gartner, which are increasingly focused on the subject.
The technologies introduced to the market over the past several years, in response to the increasing complexity of the enterprise undergoing digital transformation, have led to the birth of the term ‘AIOps’ (Algorithmic IT Operations), coined by Gartner Research in recent papers by analyst Colin Fletcher.
AIOps vs. ITOA
As Fletcher explains, AIOps is essentially the evolution of technologies that were previously categorized as IT Operations Analytics. While ITOA is still very much relevant, it represents a rather broad set of functionality focused around analyzing IT operational data, including monitoring, log analysis, security, etc. Furthermore, the category is inclusive of tools from vendors like CA, EMC, Solarwinds, and Zenoss, who do not have the kind of native machine-learning capabilities in their core ITOA products that would justify inclusion in the AIOps category.
AIOps platforms analyze large volumes of IT telemetry from disparate sources and apply various forms of algorithmics. By leveraging AIOps platforms, IT organizations can automate and enhance IT operational practices and access continuous insight into the performance of their business services.
Is AIOps relevant?
In the paper, “Applying AIOps to Broader Datasets Will Create Unique Business Insights,” Gartner reports that AIOps worldwide spending surpassed $1.7 billion in 2015. Furthermore, they report that, by 2020, approximately 50% of enterprises will be actively using AIOps platforms to provide insight into both business execution and IT Operations, which is an increase from fewer than 10% today.
As enterprise continues its journey through digital transformation, and experiences massive change and scale, it will be forced to either ramp up operations headcount, or adopt AIOps platforms.
Which would you choose?
Key Components of an AIOps Platform
Gartner describes the logical architecture of an AIOps platform in the paper, “Innovation Insight for Algorithmic IT Operations.” We at Moogsoft fully agree with this architecture and propose a simplified perspective for understanding how the different pieces of an AIOps platform fit within your broader IT Operations instrumentation.
Based on the most successful IT organizations we have seen, it’s clear that the key components of an enterprise-level IT Operational toolchain include a Monitoring Ecosystem, a System of Engagement, a System of Record, a System of Automation and a Data Lake.
The monitoring ecosystem exists to provide visibility and create telemetry across the physical and virtual stack. These include tools from AppDynamics to Solarwinds. Your monitoring tools are crucial for maintaining high service quality, however having a comprehensive ecosystem inadvertently creates insurmountable amounts of noise that leave IT Ops teams setup for failure.
The System of Engagement reduces noise and delivers service insights to the right people in real-time. It’s the first place that operations teams should look when something breaks. In fact, the System of Engagement should let you know that something is going to break as issue is unfolding, allowing you to avoid impact. By ingesting the full spectrum of IT telemetry and applying machine learning in real-time, the System of Engagement fully enables early incident detection and remediation. An ideal System of Engagement is a tool like Moogsoft.
Your System of Record enables the interaction and documentation of service requests and disruption. It essentially manages all trouble tickets and knowledge for future reference and ties back to the CMDB and Service Maps, which can be improved over time by the system of engagement as previously unknown relationships are discovered. Systems of Record include tools like ServiceNow and Jira.
The System of Automation is there to automatically run resolution scripts to streamline repetitive tasks from incidents that occur on a regular basis. Common actions include orchestration, runbook automation, and IT automation. Systems of Automation include tools like Ansible and Puppet.
Lastly, the Data Lake exists for forensic diagnostics, ad-hoc reporting and business dashboards. It ideally maintains all of the data that you would ever need to investigate, which typically exists in the form of logs. When you know what you need to look for (from insights provided by the System of Engagement) the data lake is crucial for conducting that deeper analysis. Data Lake tools include Splunk and ELK.
AIOps Platforms are the Next-Gen Solution for IT Operations
IT telemetry and complexity will continue to increase at an exponential pace, yet human capability will remain the same. This is why IT operations need to strategically leverage AIOps platforms to accomplish certain tasks, without neglecting to acknowledge that humans must be responsible for others. It is up to AIOps vendors to understand this balance and provide humans with the most insight and reduced workload so that they can focus on what really matters — customer experience.
Get started today with a free trial of Incident.MOOG—a next generation approach to IT Operations and Event Management. Driven by real-time data science, Incident.MOOG helps IT Operations and Development teams detect anomalies across your production stack of applications, infrastructure and monitoring tools all under a single pane of glass.