Instead of going to Blockbuster to rent a movie, we stream content online from Netflix. Rather than visit dealerships researching new cars, we visit Cars.com. On occasion, instead of stepping out to pick up a bite to eat, we order that old standby meal for delivery from our favorite neighborhood spot via DoorDash or Caviar, and eat through our shame in the safe confines of our home.
The worst thing that can happen to any of these digital businesses is performance degradation. Whether it’s a failed transaction or a full blown outage, downtime costs online businesses money, not to mention customer loyalty.
IT leaders are now looking for ways to gain insight into issues before they impact end users. The IT execs I speak with typically describe this gap as a need for predictive insights, yet I think this needs clarification.
In this post, I’ll explain predictive insight with regard to IT Monitoring, as well as proactive and prescriptive insights, which might actually be exactly what IT execs have been looking for.Predictive Analytics & Insights: What Does It Really Mean?
Predictive analytics is the practice of making predictions about unknown future events based upon historical models. This is actively being pursued at most enterprise organizations and, based on my conversations, the results haven’t been as expected. Here’s why…
These are the steps required to accomplish predictive analysis:
1. What are you looking for?
Google is the most powerful search engine in the world, but if you don’t tell it what you are looking for, how is it going to bring back any results? For predictive analytics, the first step is to actually decide what exactly you are looking for. Whether it’s a spike in transaction times from a certain host, or alerts related to an improper failover, the project must be defined.
2. What’s a relevant data sample?
You need to collect a relevant data sample that resembles the data you would be analyzing in the future. This is key in order to accurately train your model.
3. What are the reference points?
Within your data sample, what are the reference points or identifiers that you care about? This can be something like a hostname, a keyword, or an error message.
4. What analytics should you apply in order to create a working model?
With the availability of open-source libraries (like python’s scikit), getting access to powerful machine-learning algorithms isn’t the issue. This issue is knowing how to apply them. It’s best that you get a team of resident data scientists to take care of this for you. (If access to a team of skilled data scientists isn’t realistic, I’d skip to the next section.)
5. How do you apply this model to real data for predictive insight?
Now you need to define the dimensions of whatever query you need to run that will pull the relevant data that your model can be applied to.
6. Lastly, how can you automate this process?
The reality is, you probably need to run some sort of job which will have significant latency. The batch job can be reflected on some customized KPI dashboard that notifies you when some threshold is breached.
So, while striving for predictive insight may have some benefits for detecting IT incidents and impact, it is a very heavy lift. Furthermore, it’s not real-time and it’s only as good as the model that you built at that moment in time. You need to ask yourself: How relevant is your current IT data to your future IT data? If 27% of IT incidents are recurring, that means that 73% have never been seen before and any sort of predictive model won’t serve much value.Moving Away from Predictive Analytics & Embracing Proactive Insights
Proactive insight is the concept of providing early warning of abnormality. In today’s fast-moving and ever-evolving IT infrastructures, it implies real-time (milliseconds or seconds) insight of an IT incident as that incident is unfolding. In order to achieve proactive insight, you need an Algorithmic IT Operations (AIOps) tool that can automate the analysis and correlation of event data across the entire IT infrastructure so that Ops teams have the early warning and context required to address an issue before users are impacted. The key benefit of an AIOps tools is that it provides proactive insight without being explicitly told what to look for.
Because these systems rely primarily on algorithms, supervised and unsupervised, it’s important to note that the systems get trained over time by your IT data, as well as user-supplied feedback, to continuously improve the accuracy and ultimately the proactive Insight.
It works by generating a neural net, essentially a data-driven model, that automatically learns about your environment over time to be able to separate the signal from the noise. As an example, the system might learn that, if you don’t get to X & Y within Time (T), then Z might occur. Another example could be inferring the probable root-cause(s) of an incident based on a group of related alerts.
An example of a tool that offers Proactive Insight is Moogsoft AIOps. Moogsoft AIOps has a range of patented approaches to deliver Proactive Insight (listed below).
Putting the Data to Use Through Prescriptive InsightsPrescriptive insight is the practice of recommending decisions or actions pertaining to a event or group of events. In the context of IT monitoring, prescriptive insight can be valuable for supporting operator decisions. An example would be automatically recommending remediation steps based on a certain group of alerts.
Prescriptive insight is made possible by capturing and learning from previously seen behavior. The quality of this insight depends heavily on the quantity and quality of user-supplied feedback. As IT operators communicate within an AIOps system and unveil resolution knowledge for a particular Incident, the system will capture the knowledge and learn over time. When paired with proactive insight, prescriptive insight can provide real-time recommendations as an incident unfolds.
Moogsoft AIOps, for example, is able to proactively correlate IT events in real-time and present that insight to the appropriate stakeholders. As those stakeholders communicate and resolve the incidents within the system, Moogsoft AIOps learns from this behavior and can recommend resolution steps in the future when it begins to cluster events that resemble a cluster seen in the past.The Good News: Proactive & Prescriptive Insights are Attainable TodayWhile predictive analytics may sound like the most bleeding-edge approach to IT incident management, the required effort and the results are not consistent with what IT vendors have marketed.
Those looking for real-time insight into potential service impacting incidents along with decision support on how to approach those incidents should really investigate AIOps solutions that offer both proactive and prescriptive insight for IT Operations teams.
About the author Sahil Khanna
Sahil Khanna is a Sr. Product Marketing Manager at Moogsoft, where he focuses on the emergence of Algorithmic IT Operations. In his free time, Sahil enjoys banging on drums and participating in high-stakes bets.