I returned from DevOpsDays in Chicago a couple weeks back, and was rather surprised. First off, it was way different to the first DevOpsDays I attended in Mountain View back in 2011. It all felt quite professional, grown-up and… wait for it… vendor friendly. This was not the shocking surprise I witnessed though.
Over the two days I was there (as booth babe) I had at least five conversations around event correlation, and the challenges/failings of companies who have attempted to develop such a solution in-house. For those not familiar with event correlation, it’s basically the task of analyzing and making sense of the 10,000 odd events that get fired from your 50 odd monitoring tools when your applications or infrastructure break. Without event correlation, IT Operations ends up drowning in alert storms, and as a result, they often miss or ignore information, which can help them understand and fix problems quickly. It’s a major problem, big enough for many enterprises to go down the build-it-ourselves path.
When I worked at AppDynamics, I saw this build vs. buy argument many times for Application Performance Monitoring (APM) solutions. The only customer who came close to building one properly was Netflix, and I think this fact alone should highlight the challenge that lays ahead for any company considering the “build” path.
The odds really are stacked against you, and this isn’t just because I work for a vendor; it’s tough even for us domain experts and vendors who have built this software in the past. It requires more than just funding a dev team for a few years and hoping all goes to plan. And then there is ongoing support and continual enhancements.
What To Do
If you are a business who is passionate about event correlation, incident management and making IT Operations people unbelievably happy – go build an event correlation solution.
If you are a business who is passionate about your vision, stock price, kicking your competitor’s ass and winning – go buy an event correlation solution.
My point here is that you need to focus your software development and innovation on your core business. As tempting as it might sound to start building tools for your IT Operations team, nothing will ever make you more competitive than delivering new innovative products or services for your target customers.
Application Infrastructure is much bigger, dynamic and complex than what it used to be, and it’s going to get worse with mobile, micro-services, cloud and big data. Scale and Change are hard variables to plan for. When does the upside of building an event correlation solution start to outweigh the risks, time and costs associated?
At a bare minimum, this is what you need to build, support and maintain:
- Event Correlation Engine & Data store
- Event Integration Adapters for all your eco-system tools (50-100 different adapters)
- UI for configuration, data manipulation, diagnostics, reporting and collaboration
- Systems integrity, e.g. security, teaming, backups, maintenance
Ballpark Costs of Building
Without knocking out a spreadsheet of line items, 10 developers for 1 year is a minimum of $1.5m when you factor in all costs associated. Throw in an architect, UX designer, QA team, DBA, doc writer, tooling and infrastructure and you won’t get much change from $2.5m a year, and this is being conservative. You’re already at $5m before you hit maintenance mode and then the costs keep coming. Those integration adapters from your 50-100 toolsets and APIs will all need maintaining and testing with every vendor release. The bad news is that you’ll also have to wait 18-24 months for any ROI assuming everything goes well. That’s a big risk to take when there are no guarantees.
Subscription Pricing Keeps Vendors Honest
There are many potential event correlation solutions in the market to buy. Some are basic, some are sophisticated, some are cheap and some are expensive. The right solution is out there because no solution whether you build or buy will meet 95% of your requirements or needs. Remember Scale and Change are two variables you can’t predict. It’s also worth pointing out that most event correlation solutions can be deployed in weeks and are subscription based. You no longer need to lock yourself into perpetual license deals, cross your fingers and hope it all works out. Vendors these days are incentivized to deliver value and ROI continuously because they know that your renewal depends on it.
Buying will always be cheaper and less risk than building the solution yourself. This wasn’t the case a few years ago when your application infrastructure was relatively small, static and manageable – you could simply knock out and tie together a few scripts to manage a few hundred events and all would work out. Today most enterprise environments generate millions of events hourly from your several hundred applications running across thousands of servers.
“I wish we’d had bought it, instead of built it” said a DevOps engineer in Chicago a few weeks ago.