They say that leopards don’t change their spots and that imitation is the sincerest form of flattery. However that doesn’t explain why new Netcool clones keep appearing – imitating a 20-year old [great] idea that is no longer fit for purpose to assure service in modern IT infrastructures with multi-vendor support operating models.
First there was OpenNMS, then TCI and a plethora of others during the 1990s. More recently we’ve seen ex-Netcool integrators and ex-Netcool VP level folks launching their own facsimiles!
Dumping Cloud APM
Anyway, to the plot: I was defibrillated into writing this blog post after watching yet another management software company dump their existing Cloud APM in favor of a Cloud-based Manager of Managers with de-duplication and one-second Event granularity as their “new innovative unique value proposition.”
But hold on, isn’t the Manager of Managers category not only saturated, but also nascent? It’s funny really because those Manager of Manager products that challenged Netcool in the 1990s folded in the towel or were consumed by companies who should have known better: Maxm, NetExpert, CommandPost and others. Some of these products that are still in use cannot automatically de-duplicate repeat Events; if you have to write rules to display events…how will you ever see the Events you do not know about…you know, the ones you really want to know about?.
1993: A Big Year 1993 was a big BIG year. The European Community enabled the free movement of people, the Czechs and the Slovaks became neighbors, the Rodney King trial and, a deadly Earthquake and Tsunami hit Japan.
In the UK British Rail was privatized and Phil Tee invented Netcool.
Phil invented Netcool to solve a specific problem: to enable fewer skilled support operators to more quickly identify whether a fault was occurring.
Happy times for network and compute folks. A time when single faults caused impact. When infrastructures were simple – and when the company operating the IT infrastructure owned that infrastructure.
Before Netcool, the Manager of Managers would only indicate an Alert if a rule existed to describe that Alert.
- Netcool allowed IT and Communications support staff to see every event without needing a Rule to describe that event.
- But more importantly, Netcool deduplicated repeat events into a single Alert so operators could quickly, through human cognitive comprehension, work out whether an Alert indicated a real fault or was noise
- Netcool allowed operators to create simple filters so that you never saw that ‘spam’ noise again
The Problem Today
Now here’s the problem (which also applies to today’s Netcool Me-Toos)
- As a short discussion with long-term Netcool user will tell you…deduplication and manual rules brings you to filtering overload
- Combined with modern ‘joint vendor’ Support operating models and multi-vendor sourced infrastructures, and technical support silos and expertise arbitrage, it has become very difficult to maintain the existing Netcool rules and to add new ones as changes to the increasingly complex service delivery infrastructures are made.
We call it “Netcool Maturity”
It’s the point when you realize that no amount of configuration of your existing service assurance tools will enable you to pre-empt customers calling to tell you their application is disrupted.
It’s where you have configured so much filtering that you are unable to efficiently add more rules, and where the First Level Support Operators (who look at Netcool [type] Alert screens) have become ‘catch and dispatchers’, unable to action Alerts where there is no corresponding knowledge article or runbook.
When you make Infrastructure or Application changes (our customers make anywhere from 100 to 10,000 infrastructure changes per day!!!), it has become impossible to maintain correlation models and you can forget about Business Service views.
Clearly this causes big problems for the customers of Netcool type systems but introduces fundamental problems for those companies building and hawking Netcool clones:
RiskYes mature Netcool customers are unhappy that Micromuse and its subsequent owner left them with a legacy of several different configuration languages and a behemoth of a pricing and product dependency matrix.
But… * If I replace Netcool with a tool which offers equivalent functionality but doesn’t solve the problem that I really face today, then I’m simply introducing risk * I incur the cost of re-creating all my Netcool rules (to get to the same state Netcool is in today) without any Net return on investment or, new value for my support operations or business
To all the Netcool Me-Too’s I ask: “why ask people to replace Netcool functionality with Netcool functionality and risk?”
Deduplication and Correlation Rules (whether hosted in the cloud or on premise) offer no reduction in the time to identify Incidents, no harmonizing of the support of joint vendor underpinned infrastructures and, no reduction in the number of ‘spam’ actionable tickets or escalations for DevOps and Applications support teams.
Replicating the rules and models that exist in customers’ Netcool today in a new tool that offers the equivalent functionality to Netcool just introduces risk.
Here’s the Problem We Need to Solve
The central issue for those companies trying to assure service in modern joint vendor and demand driven compute is to be able to, in real time: * Identify that some kind of Fault or Incident is occurring as it unfolds to reduce diagnosis time and action resolution before the end-user calls * Highlight the Service impact indicators to proactively minimize disruption and * Finger-print the appropriate stakeholders to involve in a given Incident, informationally and collaboratively, to reduce MTTR
The requirements of an Operations and Service Management today are almost the polar opposite to the needs of 1993 and 1997.
In 2014, more than 20 years after the invention of Netcool, the problem is not simply filtering out noise. It is that now, our application delivery infrastructures are totally different. We no longer operate in-house owned and in-house operated homogeneous infrastructures, today. Instead, we operate hybrid in-house and multivendor owned infrastructure technology and, we utilize multivendor support operating models where the support partners are geographically, organizationally and informationally separated. When we add in Cloud, we even lack visibility of the underlying infrastructure (and therefore our business supporting Applications).
Call me arrogant, but those companies still hawking that basic woefully inadequate functionality are offering little in the form of benefit to customers who depend upon their IT and Communications infrastructures to support their Business services.
Are You Netcool Mature? …So what’s the point of my rant? For those of you still reading, I commend you. You must be Netcool Mature.
The team behind Moogsoft, (including Phil Tee, Mike Silvey [me], and the leading team members mentioned in the Red Herring S1’s for both Micromuse and RiverSoft), has had the benefit of having repeatedly developed class leading ‘in their time’ Service Assurance software solutions, underpinned by substantive intellectual property, which deliver more value than their cost of ownership.
Moogsoft was formed by Phil and myself to deliver an agile solution that does not rely upon rules, topology models or, recent history performance behavior.
Our mantra is Change Change Change.
Incident.MOOG is the first IT Operations innovation in 20 years that offers a tangible reduction in actionable workload for operations support staff, enables support operating models to change, caters for the real world of agile compute and, the information sharing harmonization across joint vendor support towers.
Data warehouse tools combined with Analytics force you to store Petabytes of data and then make an archeological dig, after an Incident has occurred. But the fire is already burning, the damage is already being done and even then, it’s the end-users who report the Incident. (Forrester claims 74% of Incidents are identified by the end-users and not the IT tools. We can confirm that in some of our broadband customer engagements).
Even if analytics and data warehouses can give you a clue to the causality, you still do not know who is impacted or, who else is working on the Incident. There is no information sharing across ‘towers’ of IT support, whether in-house or in a multi-vendor support operating model.
Our breakthrough at Moogsoft is our ability to infer that an Incident is occurring as that Incident unfolds, without pre-knowledge of what is background noise, previous infrastructure behavior history or topology and correlation rules.
We do this through the exploitation of unsupervised machine learning adaptive algorithms.
Incident.MOOG works with any unstructured or structured textual content sources: application log messages, Events, SNMP Traps, syslog, management Alarms, etc. so doesn’t require complex instrumentation or thresholds to be set up.
Incident.MOOG is a passive system which can work with the customers’ existing management tools and Incident Management processes to informationally unify all the support resources without organizational change.
Incident.MOOG fingerprints the appropriate stakeholders who need to be aware of or engaged in resolving an Incident in a collaborative virtual incident room. Think “Facebook Wall” per incident. Early notification, awareness and collaboration.
Incident.MOOG works even when infrastructure change is dynamic because it is not model or previous ‘pattern’ matching based.
We say “Early warning using real status data is always better than attempting to predict with statistically insignificant datasets”. You can rely on Incident.MOOG. Its answers are grounded in fact.
Are we crazy or are we on to something? Let us know. Register a comment; send us an email at @firstname.lastname@example.org.
If you are Netcool Mature, why not give Moogsoft a try! (No need to replace your Netcool to try it or to get more value out of Netcool…)