Learn how customer experience insights are derived from your data, and how to best leverage them to deliver greater value
DevOps practices, and the teams that implement them, are becoming increasingly critical to the value which any company provides its customers. This was the key message throughout a recent fireside chat between DevOps Institute Chief Ambassador Helen Beal and Moogsoft VP of Product and Design Adam Frank.
Moderated by MediaOps Managing Editor Charlene O’Hanlon, this lively and interactive session peeled back the layers of insight that observability data delivers teams about the value of digital products and services, and how to increase that value by building self-service observability practices into the software development lifecycle.
Teams want to define value very badly but often don’t, said Beal. Changing this is important, as today’s incremental approach to business and software development means that value is critical to understanding a project’s success or failure.
“In the software industry as a whole, a lot of companies are becoming more product-led, meaning they want to guide their users to some type of valuable outcome,” said Frank. “To do that, they need to define what that valuable outcome is. This could be minor milestones, but the days of measuring whether a project delivered in the end are out the door.”
Beal added that the incremental approach to defining and measuring value also helps motivate teams so they can feel rewarded for their accomplishments along the way. However, she concluded that defining where and when teams measure value is just as important as defining what value is, and recommended teams build value estimates into user stories.
“If we take the time to code this feature when it goes live, we think it will have this impact and produce ‘x’ amount of value, and that ‘x’ amount of users are going to buy ‘x’ amount of product, or maybe that we’re going to increase the stickiness of the product,” Beal said. “Teams have to work hard to get those numbers.”
Getting data is just the start, Beal continued. Teams must also bake the process into the CI/CD cycle, and determine how to use that data. For example, the process of A/B testing allows teams to gather data about what is working and what is not, then treat the failure of what is not working as a learning opportunity to improve. In the same way, machine learning data can offer insights into improving the value your product offers.
“As you start to emit more and more metrics about your apps, look at the traces, response times, etc. — there are a lot more things to measure value on such as stability and reliability,” said Frank. “For instance, is one second or two seconds response time too long for a customer, and then they are off to somewhere else for a similar service?”
With observability data from within the system, you can start to provide a lot of value about where pain points are coming from.”
AI: Critical for Accelerating Observability
Both panelists agreed that AI is indispensable to sustaining value, as the more you can accelerate observability, the more you can use it to shine light on value in time to sustain and improve it.
“Having AI that can help you figure out and categorize this data is going to be key to the reliability of what you are providing,” said Frank.
Borrowing an analogy from her colleagues in the security world, Beal likened the trajectory of software quality over time to that of milk more so than wine, in that it gets worse with age. Software, therefore, must constantly be refreshed.
“While you might build the all-singing, all-dancing app today, someone will look at it, replicate it and come up with new funky stuff that people will like as well,” she said. “You just can’t stand still in this industry.”
However, with so much complexity in modern IT systems, Beal stressed it is hard to receive all the feedback in time to innovate, which is where AI comes in. Frank added that by using AI, you may also discover a new value that you weren’t even looking for.
The bottom line, Frank stressed, is gaining actionable insights to quickly find and fix problems so that they do not happen again.
These insights, he explained, appear when you combine the error messages developers have written into an application with other pieces of data to create high-context insight. Metrics, which on their own don’t say much, can tell tales when looked at over time and compounded with other time-series metrics. This is when an event is formed.
By looking at this data along with logs and traces, AI can pull pieces of information out of each of them to reveal similarities and context about the environment and what is happening, to quickly understand why it is happening so SREs can pinpoint root causes and remediate them.
Code Away the Toil
Done manually, repeatedly fixing the same problem is terrible, boring, onerous work: what we call toil. Frank concluded his comments by describing how self-service AI-driven observability reduces this toil by automating these steps. It shows where time is being wasted and where an SRE can essentially code themselves out of toil, then release that time for innovation.
“AI helps us get out from under the flood of Slack messages about what happened and move on to the cause of that,” he said, adding that self-service observability empowers SREs to do this in minutes, and start leveraging AI to drive better value for customers at their own pace.
Beal concluded with a note on the value of self-service observability for the DevOps culture.
“If leadership wants to give the team the ability to build it and own it, then the self-service aspect is essential,” she said. “The only way to improve is if people know how they are doing, and the only way to do that is to have metrics on the current state with observability. If you want to accelerate observability, you have to do it with AI.”
Watch a recording of ‘Measure Customer Value with Self-Service Observability’ to get all the details, best practices, and insights shared by Beal and Frank. The webinar also included a Q&A with the audience. Below are the questions the speakers answered in writing after the webinar.
How is customer value measured?
Beal: I think it’s important to differentiate between business value and customer value and be cognisant that lean principles guide us to be customer-focused, since delighting customers is the surest way to organizational success. When we think about how teams deliver business value to their organization, we tend to think in terms of revenue and margins, profit and loss. Although it’s imperative that a product or value stream team knows that what they cost is less than they make, knowing what their customer is experiencing is essential to making decisions about what to do next. Customer value, then, is key.
There are many ways in which to gain intelligence about customer experience and sentiment, and they do vary from organization to organization, and can be industry-dependent, but there are also cross-industry patterns. For example, retailers are likely to care more about basket size, insurers about policy conversion rates, but everyone’s going to care about session lengths and bounce rates. Some value metrics (like session length or referrals) could be considered proxies for customer experience in that they indicate a trend or correlation — some are direct, like customer journey time, NPS and app store reviews. Underneath all of these are the flow metrics that tell us how fast we are delivering new value outcomes to our customers, led by cycle time from idea to value realization.
Can self-service observability adapt alert thresholds for highly variable time-series metrics?
Frank: Yes, adapting alert thresholds is based on the time-series metric data being received. The analysis will continuously adapt to understand what the normal operating behavior is for the given metric. As it adapts, high and low thresholds are created. When there’s a significant deviation, then an anomaly is created which becomes an alert that gets correlation with other alerts to form an incident.
Can the AI put together the machine data of Splunk, the wire data of Extrahop and APM data of AppDynamics and make sense of all the data? Is that the goal?
Frank: Yes, when alerts are generated in the three systems mentioned, they are sent to Moogsoft. Moogsoft then analyzes the data, including timestamp, to find the similarity between the events, thus correlating them together to make sense of the data.
Usually, what is the data sample size used for a given time period?
Frank: For an adaptive threshold, the time period is in combination with the sample size. For example: if I’m sampling data for 1000 points at five-second intervals then my time period would be 5000 seconds, or one hour and 23 minutes. For periodicity or seasonality, the sample size can be far less depending on the duration and extent of your periodicity. For example: a metric with periodicity over two weeks could use an aggregate, or roll up, of data to achieve the same results as using raw data. This would allow you to use a far less sample size.
We are in the process of setting up Moogsoft for our environment. Is Moogsoft just an event correlation engine? How can I leverage this to enable DevOps?
Frank: When anyone says DevOps, the first thing we think of is IT automation, and most likely CI & CD. But there’s so much more to the practice of development & operations. Your software will inevitably require continuous management, verification and innovation. By surfacing your telemetry and turning that into actionable insights you develop more and operate less.
Leveraging Moogsoft means you’re taking advantage of AI & ML to automate your monitoring, event and incident management toil. So, the short answer is no, it’s not just a correlation engine, it’s so much more. AI & ML is applied from data discovery to post-incident learning, in order to automate as much as possible and provide the insights you need about your services.
Who owns the design and implementation of self-service observability under an SRE set-up?
Frank: An SRE needs to build a continuous learning and verification cycle into your development pipeline. Part of that pipeline is the implementation of a self-service observability practice to provide the insights needed to resolve incidents as fast as possible, understand what could have happened before it does happen and ultimately improve the customer experience. You as a developer don’t ever want to push code that will cause issues, your goal, in fact, is to improve your services. By understanding what you customers experience is really like, you’ll have the knowledge to stay calm and improve things.
About the author
David is Moogsoft's Director, PR and Corporate Communications. He's been helping technology companies tell their stories for 15 years. A former journalist with the Sacramento Bee, David began his career assisting the Bee's technology desk understand the rising tide of dot-com PR pitches clouding journalists' view of how the Internet was to transform business. An enterprise technology PR practitioner since his first day in the business, David started his media relations career introducing Oracle's early application servers and developer network to the enterprise market. His experience includes client work with PayPal, Taleo, Nokia, Juniper Networks, Brocade, Trend Micro and VA Linux/OSDN.