SaaS is exploding and so it should; it takes commoditized work and infrastructure away from tech teams so that they can focus on differentiating features. But what happens when it goes wrong? How do SaaS platforms make sure they aren't letting their customers down and in turn, letting their customers down? Observability, bolstered with AI gives all the partners the best chance to optimize availability and customer experience. Here's how.
What is Observability?
Observability is a characteristic of systems in that they are observable. Its roots are in mechanical engineering. It’s a measure of how well the internal states of a system can be inferred from knowledge of its external outputs.
In software, which is largely invisible, it helps us understand how a system is operating based on the data we can collect from it. Its emergent application in the software industry is largely driven by the increasing popularity of distributed systems that have increased the complexity of the applications and services running, making them more difficult to monitor. The observability and controllability of a system are mathematical duals:
- Controllability = acting: The ability of an external factor to influence a system’s internal state and effect change from one state to another in a specific period of time.
- Observability = looking: The answer to the increasing complexity we face and how it is outpacing our ability to foresee what’s going to stop working.
Observability requires intentional architectural design and development. It’s not a tools category; the tools category is monitoring which is how we observe our observable systems.
What makes SaaS different?
Software as a Service (SaaS) is a delivery pattern that is transforming every part of the IT sector, including DevOps. It replaces difficult-to-configure on-premises architecture with uniform and consistent services that remove scalability from the list of an end user’s concerns being built and run in a cloud computing environment.
Businesses employ SaaS in a huge variety of use cases, for example:
- Financial and HR (ERP)
- Customer relationship management and marketing
- Application/service components such as payment gateways
- Components in the DevOps toolchain (backlogs, artifact repositories, group messaging, CI servers, service desks)
- Software development and delivery infrastructure (IDEs, low-code/no-code, orchestration)
The software itself is often run on public cloud infrastructure. Popular global providers of the underlying cloud infrastructure are Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and Alibaba.
What are the observability challenges for SaaS?
It becomes challenging for both the SaaS provider and their customers:
- The primary burden is on the SaaS provider to ensure reliable and scalable performance for the service.
- Their customers have limited visibility and control into the service, so they are at the mercy of the provider’s capabilities - it’s effectively a blind spot for them.
- Additionally, the SaaS may be just one component of an overall set of services that make up a product or service the SaaS customer is providing and if one component fails, the entire value stream does and their customers are impacted.
The SaaS provider is trying to balance a number of goals simultaneously that are interconnected and sometimes in contention:
- They want to architect and build their application for availability, reliability, and scalability - but within their budget constraints.
- They want to assure service levels that their customers can tolerate (ideally that will delight them) and push the highest possible velocity of changes through the system to add functionality to beat the competition - but every change presents a risk of customer-impacting failure.
- They want to bring on as many new customers on board as possible and expand existing customers’ footprints in their service - but they don’t want costly redundancy in their infrastructure (whether it’s their own or their cloud partner’s).
They have to balance meeting customer expectations, managing operational costs, and investing in competitiveness.
Observability for Cloud Infrastructure
Traditional monitoring tools were not designed for and don’t work well in serverless cloud or distributed computing environments. In the same way that the users of SaaS have high expectations for the availability of the services they are consuming, so the SaaS providers themselves expect near-flawless service from whichever cloud provider(s) they have chosen to host their software.
Traditional server-based infrastructure relied on selecting monitoring tools to log changes, monitor data flows, and trace interactions in architecture. Developers used the tools to identify software inefficiencies, hardware taxation, and server demand. Many different monitoring tools were used on different servers for various purposes. It worked just about fine, but cloud technology is not discrete or static. Applications, processes, functions, and services exist one moment and are destroyed the next. Virtual servers are continually spun up and down. Huge volumes of data are processed and dispersed across multiple containers hosted on fleeting server instances scattered around the world.
The SaaS providers are likely architecting their software to take advantage of cloud-native design patterns that allow them to quickly build, test and deploy components by creating loosely coupled services. This means that traditional monitoring solutions won’t work for them either and any problems are going to disappoint their customers, damage their brand and cost them profits and market share.
How can we resolve these challenges?
In an ideal world, an organization’s own development teams will build observability into their products as a cultural discipline. But businesses buy SaaS tooling to take the burden of building non-differentiating features away from their development teams and provide them with the platforms to innovate the things that make their business unique. And therefore, the SaaS elements of a business’ assets become unobservable to those who have not built them.
But it doesn’t have to be that way. Teams can collect data from their third-party SaaS solutions via their monitoring tools by using APIs and browser extensions.
And the SaaS providers themselves can build observability into their systems to help their customers and themselves. SaaS providers can consider observability to be preventative maintenance. They can see and react to problems faster than current monitoring tools allow and also be able to glean insights from that data in a way that helps protect against future problems yet to rear their head.
Cloud infrastructure platforms look to cloud-native tools to ensure availability, reliability, and resilience. In addition to their own tools, cloud providers provide tools to their users (in this case, the SaaS providers) to manage their infrastructure. Some of these tools can also be used to manage the software provided as a service even though the code belongs to the SaaS provider and not the cloud infrastructure provider.
That can be a lot of tools, though and with a lot of tools, there’s a lot of noise. When something’s breaking or broken, customers, their service providers, and their cloud infrastructure partners all want a fix now or ten minutes earlier. This is where AIOps steps in and supercharges the observability by wading through all the data coming from all the systems at lightning speed and presenting the under pressure support team with the insights they need to identify and remediate the underlying causes.
Working as Partners
SaaS providers need to trust their cloud infrastructure partners to deliver the service levels they promised they would and not find their own services compromised. Enterprise customers running key components of their businesses on SaaS need to know their service won’t be interrupted so employees or their own business or consumer customers won’t be impacted. And if the worst does happen, they need to be able to collaborate fast to resolve the problem wherever it is. Having tools that can quickly locate the source of the issue regardless of who owns the system component is essential.
About the author
Helen Beal is a DevOps and Ways of Working coach, Chief Ambassador at DevOps Institute and an Ambassador for the Continuous Delivery Foundation. She provides strategic advisory services to DevOps industry leaders and is an analyst at Accelerated Strategies Group. She hosts the Day-to-Day DevOps webinar series for BrightTalk, speaks regularly on DevOps topics, is a DevOps editor for InfoQ and also writes for a number of other online platforms. Outside of DevOps she is an ecologist and novelist.