Reducing Telco OPEX for Service Assurance
Rob Markovich | September 8, 2015

When it comes to the digital evolution of communications providers, it’s all about people, process and machine learning.

When it comes to the digital evolution of communications providers, it’s all about people, process and machine learning.

Communications service providers (CSPs) are undergoing a dramatic phase of rapid change in their infrastructures, yet at the same time an even greater phase of change is quickly approaching: the era of software-defined everything. With competition increasing, leading operators are re-focusing on operational expense (OPEX) reductions and are recognizing that now is the time to invest in and modernize operational support systems (OSS) using Big Data and machine learning innovations. That being said, Service Assurance is a great area to start modernizing.

Over the years, Service Assurance has become reactive and slow to detect actionable issues. It has also become slower to diagnose causality and situationally isolated, leading to resource inefficiency. This is due to the fact that telco service delivery infrastructures have evolved since the packet-switch transformation. IP Services are now underpinned by a complex layered mix of legacy and modern compute platforms, comprising of fixed and mobile networks, using a diverse array of technologies.

The value of a Big Data for OSS approach (Incident.MOOG for Telco Service Assurance) is explained in terms of the challenges a CSP faces in optimizing its people, process, and technology:


New Network or Services (“Underlay”) are deployed with their own Service Assurance platform. These platforms may consist of multiple implementations of the same tool. For example: IBM Netcool for mobile, transmission and IP, or multiple vendors’ tools, custom to a technology or Service offering.

New Services (“Overlay”) are deployed seamlessly across the underlying platforms. For example: transcending mobile and fixed infrastructure, and IP networks. When a service interruption occurs though, it is difficult to quickly diagnose whether the cause lies within the local domain or upstream, slowing diagnosis and putting service level agreements (SLAs) at risk. As a result, the operations teams within the Network and Service Operations Center (NOC/SOC) lack Situation Awareness.

Incident.MOOG, however, push notifies stakeholders of anomalies, providing early warning and a single pane of glass 360o view of the issue, unifying the activities of responders (whether part of the cause or collateral damage), while enabling faster action, diagnosis and reduced or averted service interruption.

The proven value of Incident.MOOG’s Situation Awareness to the Telco NOC/SOC is:

Detection of actionable issues > 4 hours before existing ‘Alert-Ticket’ process.
Situational Awareness to all stakeholders of a disruption, enabling a > 40% reduction in support workload.


Operations and Service Management techniques underpinned by ITILv3 Guidelines were developed and implemented in a past time, when single faults caused service disruption and where domains of technology could be managed and operated independently of each other. Yet with today’s highly virtualized, data center driven and layered service delivery fabric – and with the seamless marriage of the fixed and mobile CPE experience – the use of linear processes delineated by the technology domain is no longer viable.

The traditional approach to Service Assurance processes now leaves telco operations’ staff situationally unaware of their relationship for any given service disruption. For example:

(a) First responders assess a ‘sea of Red’ Alerts and attempt to work out which one of the Alerts indicates a real actionable issue.

(b) The first responder consequently logs into the device, runs the diagnostic, assesses the log files.

(c) If they are unable to diagnose the issue, the first responder creates a Trouble Ticket for the next tier of support.

The proven value of Incident.MOOG’s collaborative process to Telco Operations is:

  1. 75% reduction in Tier 1 Assessment work.
  2. 90% reduction in Tier 1 to Tier 2 Escalations.
  3. Resolution knowledge capture and automatic knowledge article recycling, reducing time to resolve issues.


Service Assurance tools and technologies have been historically designed to detect single root-causes. Yet modern networks are designed to be tolerant of singular faults and failures. Inventories are significantly incomplete and inaccurate. Legacy Service Assurance tools are dependent on accurate models of topology and behavior. Today, virtualization adds constant change and tolerance of faults. Service disruptions are typically the consequence of multiple, possibly unrelated, faults leading to service degradation and failure.

Incident.MOOG automatically detects anomalies by contextualizing the causal and collateral indicating Events, without the need for complete Inventory or behavior models.

The proven value of Incident.MOOG’s model-free analytics to Telco Data Center teams is:

  1. Enables volume scale from 2 Datacenters to 12 Datacenters without reconfiguration.
  2. Change Infrastructure and Service without management system reconfiguration.

Policy and Risk

The typical adoption of new Service Assurance tools usually requires the replacement of one or more existing tools. This policy often has no bearing on the real value, or cost, of adopting new tools and techniques. However, Incident.MOOG’s quantifiable return on investment and total cost of ownership enables retention of existing tools. This is because Incident.MOOG can sit on top of existing tools, ingesting the aggregation of data feeds and applying machine-learning analytics to provide Situational Awareness.

The proven value of retaining existing toolsets for Telcos is explained in this example: A Telco has a 3.5 year non-terminable Maintenance contract for IBM Netcool. Incident.MOOG’s value as an early warning system offers an economically viable proposition which enables a phased, risk mitigated migration from Netcool.

Core Benefits of Incident.MOOG for Telecoms

  1. Reduced Mean Time to Detect Incidents
  2. Reduced Actionable Work Items
  3. Reduced Mean Time to Diagnose Cause[s]
  4. Reduced Cost of Incident Impact
  5. Increased Change Frequency and Agility
  6. Optimized and More Efficient Monitoring
  7. Reliable Knowledge Article Content Management

Moogsoft is a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter. With patented AI analyzing billions of events daily across the world’s most complex IT environments, the Moogsoft AIOps Platform helps the world’s top enterprises avoid outages, automate service assurance, and accelerate digital transformation initiatives.
See Related Posts by Topic:

About the author

Moogsoft Resources

August 4, 2020

Telemetry Everywhere: Observability in the DevOps Cosmos

July 22, 2020

What’s Observability with AIOps? Check Out Our New Book, Webinars and Infographic

July 21, 2020

Why Observability Matters to Site Reliability Engineers

June 29, 2020

Moogsoft Express Helps DevOps and SRE Teams Develop More and Operate Less