Sometimes it gets very tiresome being the idea innovator. [groan]

When you start to demonstrate success — huge numbers of events being processed, huge numbers of devices under assurance, very large customer references, bloody noses for the legacy incumbents where you replace them, etc. — it seems everyone wants a piece of you.

You get the legacy lot, you know the ones — those that have not innovated in 20 years since the original team innovated the product in the first place (IBM Tivoli Netcool, BMC Event Manager, TruePoint, CommandPost…ahem) — and the Little Bears (those that have a great deal of trouble breeding).

The big boys tell the story that they can do what you do, and they will do it for free. They conveniently forget to tell you that it will take seven separate products (including two versions of DB2…hint: IBM Tivoli NOI), and it will create a non-integrated solution which requires weeks/months/years of training, only to find that…ahem…it doesn’t do what we do — because we patented that!

The Little Bear will tell you that their product “is in the cloud and so it works out of the box.” Ahem… One: It doesn’t do what we do. And two: you want Cloud/SaaS? We got that. You want to run our software in your Cloud/SaaS? We got that. You want to offer our product as a service to your customers from your Cloud or AWS, or some other? We got that, too.

In fact, since there are significant issues with data privacy / data protection in the scale of customers we service, the important thing is to be able to run your service assurance software in your jurisdiction, under your control.

Whichever way you want it, we can deliver it.

But that, my friends, is not the purpose of this blog post. The purpose is to see if, like me, you believe that a “solution” does not come in a box marked “Software.” A solution is the combination of some software, some process, some integration, and some people (let’s call them Users).

Service Assurance software — the type that actually forms part of a productivity-saving, agility-increasing and service quality improving solution — is not implemented like Microsoft Word.

To glibly claim that you can implement a service assurance solution out of the box is to be naive, or at best, simply lacking in experience of the requirements.

Are we trying to solve real-world complex problems here, or simply merchandizing a “shiny toy” that is brittle when touched?

The Need for Collaborative ITSM

Twenty-five years ago, Larry Garlick and Dave Mahler left their comfortable executive management jobs at Sun Microsystems and Hewlett-Packard, respectively, and their competing products — SunNet Manager (now Oracle Enterprise Manager) and HP OpenView (now HP Network Node Manager) — and started Remedy Corporation (now BMC Remedy).

Their idea was to make the process of actioning IT requests more structured and auditable.

Those were the days where a single fault caused outages. A network failure could cause compute, and therefore application, issues. A compute failure could cause an application interruption, and so on. The Remedy ARSystem (Action Request System) was born out of an inability to maintain an effective process, and so it enabled operations support compliance, and consistent behavior.

A single fault (or alert) can be transformed into an “auditable ticket,” with the process of action and service compliance documented, and service level based escalations triggered, thus ensuring that no “actionable alert” is lost in the process.

Well here we are, 25 years on, and the Remedy system still lives, sharing the market with clones like ServiceNow, HP Service Manager, CA Service Manager, IBM Maximo, Cherwell, Frontrange, and even’s

These tools are perfect for auditing the process of auctioning an alert or ticket that relates to a fault of some kind.

Sadly, they are not ideally suited to collaborative working.

Now, the above statement is very provocative, so what do I actually mean?

There are essentially two types of tickets one might create (from an operations/support process perspective) in a product like BMC Remedy or ServiceNow, for example:

  1. A trouble ticket or service request, where an item of work needs to be actioned, either having been raised by a user, or created by a management system in the form of an event or alert;
  2. And an incident ticket, where something substantial has occurred. ITIL 2011 defines an incident as “an unplanned interruption to an IT service, or reduction in the quality of an IT service.” Failure of a configuration item that has not yet affected service is also an incident — for example, failure of one disk from a mirror set.

Let’s call both (1) and (2) above “tickets” for the purpose of this discussion.

Ultimately both tickets lead to actionable work items for the users of the ticketing system. Whether your ticketing system was established 25 years ago (in the case of Remedy, and the fore-runner of HP Service Manager, Peregrine Systems), or 10 years old in the case of ServiceNow — they both utilize and advocate exactly the same process models and value.

Whether the ticket represents an incident or a simple a service request, it goes to a specific group or team in operations support.

A ticket goes to a specific user or silo of operations support people; network tickets go to the network team; VIF tickets go to the virtualization team; SAP tickets go to the SAP support team, etc.

Whether the ticket represents an incident or a simple service request, it goes to a specific group or team in operations support.

But the world of IT and telecommunications support has become far more complex.

Firstly, a network issue may or may not cause impact to the compute, network attached storage, middleware, database, and applications, therefore impacting those corresponding teams.

A storage and network issue, together with a sudden increase in end user demand, may cause an apparent application issue.

Secondly, multiple issues in different silos of technology occurring together may cause corresponding issues for downstream services — e.g. a storage and network issue, together with a sudden increase in end user demand, may cause an apparent application issue.

It is at this point where the singular, linear method of ticketing begins to show some wrinkles, and shows its age. The linear nature of the ticketing methodology and processes actually fosters an increase in the number of operations support people disrupted, and an increase in the time taken to diagnose and remediate the issue.

It may be the case that, for a given organization, the company does not own and operate its entire service delivery infrastructure. It may own and operate some functions, outsource the operations of some functions, and insource other functions (transmission network, Cloud, etc.). So multiple organizations are involved in the assurance of the services that the organization offers. Some call this SIAM (service integration and management), or multiple towers of operations.

Using the legacy ticketing tools means that each silo lacks Situational Awareness in relation to its neighbors’ activities. Specifically, they do not know whether “I am the causal party, or someone upstream is the causal party.” Worse, if a first responder needs to escalate to a second level responder (domain expert), these two teams lack Situational Awareness, increasing the time to diagnose.

Construct Your ITSM to Enable Collaboration

In summary, collaborative ITSM takes more than an install of software and more than simply a SaaS offering. It takes sophisticated integrations. It takes bipartisan intra- and extra-organizational alignment.

But more than that, it takes a software suite designed from the ground up to enable full Situational Awareness, situation-based workflow, and multiple organizations to collaborate efficiently regardless of their location, corporate structure, etc. It also helps for the suite to include features like a collaborative virtual workspace, chat ops, knowledge capture and recycle, automated remediation.


Get started today with a free trial of Incident.MOOG—a next generation approach to IT Operations and Event Management. Driven by real-time data science, Incident.MOOG helps IT Operations and Development teams detect anomalies across your production stack of applications, infrastructure and monitoring tools all under a single pane of glass.