We used to talk about how IT could support the business, as if those were two separate things. That separation no longer holds true; every aspect of the business relies on IT, and if the applications or infrastructure is down, or not performing, the same applies to the business. So what’s holding back service quality as your business grows? Here’s why it’s time to move away from email as your incident management mechanism.
As Andy Kyte (of Gartner) once said, “None of you are in IT; all of you are in business.” However, some of the consequences of this statement have yet to be fully understood.
All too often, we in IT end up managing systems in isolation, focusing on their technical aspects rather than on what the users are doing – what those systems are actually for.
This gets especially difficult when it comes to managing running systems. I was talking to a new customer of Moogsoft’s recently, and asked them how they had previously managed alerts.
The answer (to nobody’s surprise) was that IT operations teams were mainly using email as their primary alert notification mechanism. This was despite the fact that there were at least 3 tools attempting to do some level of alert processing. The problem was that each of those tools was limiting event filtering and sorting to a specific domain, making it extremely difficult to correlate events across domains. Email, however, remains the #1 way that people find out about issues with their applications or infrastructure. But as a business grows and its application/IT environment expands, email quickly starts to fail as an incident management system.
The consequences of this approach are probably familiar to everyone in IT by now:
- Lots of noise and repeating events
- No capability to identify clusters of related events across domains
- Limited capability to automate responses to known issues (runbooks)
- Information sharing with helpdesk still requires manual steps
- Customers often report issues before support teams
The aforementioned customer later quipped to me, “With traditional event managers, it’s common to talk about the ‘sea of red’, alerts that burst when something goes astray, making it difficult to detect the signal from the noise. Using email as an incident management system makes it even more difficult to separate the signal from noise, which is why everyone around here started to call it the ‘sea of unread.’
This obviously has major impacts on both the real and perceived quality of the business’s service assurance, not to mention the inbox size and stress levels of the IT ops team! Email is also not an appropriate mechanism for timely communication, especially with many participants and lots of domain-specific technical details.
The consequence of using email for incident management is that customers often report issues before IT – in essence, they become the de-facto incident warning mechanism – not only do issues become visible to customers, but they have to complain in order for IT teams to first see them. This is because the IT teams have their heads down in the minutiae of their own specific area, with no visible awareness in terms of the entire business service.
A Better Approach to Managing Incidents and Alerts
Businesses that proactively recognize the limitations of email as an incident manager, and instead move to Incident.MOOG, immediately realize superior levels of application service quality.
Moving the focus from a technical domain perspective to the business service perspective requires a change in thinking. The goal is to make a business service available all the time – to always be open for business. Gene Kim has a formula for this:
“Formula I distantly remember: super cool: Availability = (MTBF)/(MTBF+MTTR); ‘perfect availability is w/no failures or instant recovery’”.
Incident.MOOG is here to help act on both sides of that equation. Our patented algorithms are able to eliminate noise and identify developing situations before they escalate to the point of causing outages, or significant performance degradation. We do this by breaking down those barriers between different areas of specialization and correlating alerts across many functional domains to identify patterns that are not visible when focusing only on one area.
Once we have identified these situations, we assemble a team with the right people to work on the specific problem areas that are part of that situation, providing the social tools to enable them to communicate rapidly and effectively to resolve the situation fast. Because of the effectiveness of our clustering, it also becomes possible to integrate with helpdesk, runbook automation, or other diagnostic tools, without adding to the noise.
This is what helps our customers keep their IT services available and open for business. IT and business are now one. So break out of your ‘sea of unread’, stop using email to find out about issues, and move to an incident manager that scales to your business.
About the author Dominic Wellington
Dominic Wellington is the Director of Strategic Architecture at Moogsoft. He has been involved in IT operations for a number of years, working in fields as diverse as SecOps, cloud computing, and data center automation.