Change can cause service interruption, but there's a smart way to manage that.
A maxim in our industry is that when something stops working, the first question IT support asks is “what did you change?” It’s the right question. Almost 20 years ago, industry analysts reported that the majority of outages were caused by change.
About 10 years ago, I was working on a software solution for trouble-shooting an IP-PBX system. In common with many systems, issues often stemmed from ill-advised changes. This lead to a technique where regular configuration “snapshots” were taken, so when a problem surfaced, a “diff” of the current configuration, compared to a last known good, could be presented to the engineer as part of the ticket.
It was well received, but that was 10 years ago.
- The velocity of change (and the dimensions of configuration) has increased dramatically
- The snapshot approach lacks granularity
- Now, you may not be analyzing changes made by a human in a maintenance window, but changes made by an autonomic system in real-time
Here’s a better approach:
- Source change notifications
- Correlate them in real-time with developing situations in the infrastructure
Now that’s 21st Century agility!
This is one of the difficult challenges that Moogsoft AIOps solves today.
About the author