Q: What’s the future of Moogsoft, and where is it going?
Moogsoft pioneered AIOps, essentially inventing the market 10 years ago. It is worthwhile revisiting why we did that to understand where we are going. My background is as the founder and inventor of Micromuse Netcool, and the RiverSoft’s OpenRiver technology. Those approaches were revolutionary in their day, but based upon the idea that infrastructure was fixed, applications may be less so. That radically changed with the advent of cloud computing and virtualization and we realized that AI was necessary to perform the advanced data analysis needed to quickly identify, diagnose and remediate the thousands of minor glitches that occur in a large business like Manulife. The rub being if they are left unresolved minor glitches can become major outages.
Looking forward, the arms race continues as we see increasing adoption of serverless, lambdas, SDN, DevOps, CI/CD and many other technologies. In fact the “doubling time” of change seems to be shortening. What this practically means is that we have to broaden the scope of our product from events to metrics, traces, logs, business data, environmental and social, and double down on the algorithmic sophistication we use to perform our critical task of moving our customers from 5 9’s to no nines. We today have a platform that can handle metrics, and we have active research in all areas of complex event analysis.
Tomorrow I envisage a single platform as the repository for all operational data, handling all availability management tasks from SecOps, DevOps, ITOps, SRE, Alerting, Problem Management and Service Desk. This will allow us to drive automation and liberate the time and attention of operations to run availability and risk as a business process not fire fighting!
Q: How does this take our ecosystem to the next level?
There are essentially two critical outcomes:
- Availability: For example, one of our customers Manulife already does an excellent job of managing their error budget (total availability), targeting 5 9’s as the availability rate. Working together we can go after no-nines, ie 100% availability with business services being continuously available. We can see a time where major outages are exceedingly rare, if occurring at all and instead we manage a business operational risk metric. This essentially transforms platform services from a cost center to a P&L center as the consequence of opaque business operational risk is the need to hold higher reserves, reducing the return on equity. Not only can we target a better customer experience but better financial performance!
- Operational Efficiency: Automation is the primary tool to reduce “toil” which is essentially the consumption of time in repetitive and mundane tasks by ops folks. These people are already overworked and overstressed (think air traffic control), and this really is about making sure they have more time for the fun side of the job, and a net reduction in the capital and opex spent by the firm in unproductive (but necessary) work.
So in short, better service levels, lower costs, more return on investment. That has to be good … right?
About the author
Phil’s passion has been IT operational management ever since he co-founded OTT (better known as Micromuse). Having also invented Netcool and built RiverSoft to a successful IPO, Phil now leads the next big revolution in IT event management with Moogsoft, where he maintains a passionate commitment to innovation, including personally leading the company’s numerous product functions.