Two weeks ago, I attended O’Reilly Velocity, the conference that gave us “real-world best practices for building, deploying, and running complex, distributed applications and systems.”
My favorite keynote was “The Future Works like People” by Adam Jacob (@adamhjk), CTO of Chef. He distilled and discussed the Velocity culture into four themes: it’s Open, Innovative, Humane and Driven.
Velocity Culture is…
- Open – It respects open communication between all members.
- Innovative – It’s always striving to make changes in the industry.
- Humane – It is a safe space to discuss how hard life is for IT Ops and DevOps Engineers.
- Driven – It’s fundamentally restless about IT Ops technology and where the industry is headed.
I interacted with a lot of people and I can confidently say that the Velocity culture was everything it claimed to be.
Survey Results
As it was a DevOps-focused conference, it was a perfect setting for the Moogsoft Monitoring survey.
Though I interviewed a lot of ITOps folks, the job titles we saw the most were SRE & DevOps Engineer, System Architect, Performance Engineer, and Internal Systems Engineer.
And after researching the companies that contributed to our survey, I found that the top industries were Software (surprise, surprise), Finance, Healthcare, and Media.
Key Findings:
- The top 3 most used monitoring tools are Splunk, New Relic and Nagios.
- The top 3 monitoring challenges are alert noise / fatigue / volume, alert correlation across all tools, and collaboration across teams.
- The average level of alert / event volume per month most commonly cited by companies we questioned was in the hundreds.
- On a scale of 1-10 — 10 being the most proactive company ever — most companies said they are a 7.
- The average number of P1s / SEV-1 incidents per month cited by most of the companies we surveyed was “a few” per month.
The Most Interesting Fact:
Much like last month’s Monitorama PDX 2017 Monitoring Survey, almost 50% of respondents said that they have 5-10 monitoring tools, and yet 67.5% of them said that their number one monitoring challenge is alert noise / fatigue / volume.
It’s safe to assume that, like our Monitorama ‘17 attendees, while their monitoring tools are increasing visibility around issues, they are generating too much noise.
Additionally, when I asked them — on a scale of 1-10, 1 being the most reactive company ever, 10 being the most proactive company ever — where they ranked their companies, over 50% said their company is fairly proactive (7/8) when it comes to alert/event management.
Perhaps they are less proactive because of the overwhelming alert volumes and P1s they’re dealing with?
Monitoring Survey
Over 45% of participants said that they use SCOM as their event manager. This is a huge change from Monitorama where over 76% of participants said that they don’t use an event manager / manager-of-managers (MoM) platform.
Thirty-six percent of the surveyees are still using tools from legacy vendors like IBM and HP. Some of the respondents also took some extra time to show how they really feel about Netcool:
New Relic beat out AppDynamics and Dynatrace, the other two leaders in Gartner’s Magic Quadrant, to be our most used APM tool. New Relic is shining in the APM game, but also in the stock market, celebrating a 52 week high this week.
While SolarWinds is surprisingly M.I.A in the NPM category, ThousandEyes, Dynatrace, and Riverbed are tied for first in the NPM bracket. It’s also important to note that, for the second survey in a row, Viavi, a supposed “Leader” in Gartner’s Magic Quadrant for NPMs is missing entirely. Who is using Viavi, and where do I find these companies?
PS – @ThousandEyes, you have an admirer ?…
Nagios continues to dominate the Infra Monitoring game. It seems to truly be the “industry standard in IT infrastructure monitoring,” as it boasts on its website. It also helps that it’s free.
Splunk wins the log race, with 54% of all participants saying that they use it. But a majority of the attendees that I spoke with admitted that they are looked for cheaper alternatives.
One real threat to Splunk is Elastic Stack, one of the most successful open source software providers in the industry. Crossing the 100M download mark in March 2017, they clearly have a growing global customer base and are becoming worthy competitor to Splunk.
PS – Elastic Stack beat Splunk at Monitorama.
Pingdom, acquired by SolarWinds in 2014, has a healthy lead over Catchpoint and Gomez. This is one more tool that SolarWinds has added to its arsenal in it’s quest to be a monitoring powerhouse. Dynatrace, which owns Gomez and Keynote, should be aware that Pingdom has dominated the past two surveys.
Over 25% of respondents still use email to notify the right teams of an incident. Let me repeat that, 25% of respondents STILL USE EMAIL.
This is most definitely supporting the stat that over 67% have a noise problem. I feel bad for those L1 and L2 operators drowning in non-actionable alerts. RIP.
Over 73% of participants said that they use Jira. No real surprise here. But dear Jira users — did you know that Atlassian just hiked up prices for you (on July 10th)? Will Jira usership go down due to this price change? Only time will tell.
It’s also interesting to note that less than 15% of respondents are using the legacy vendors, BMC & HP.
No surprise here — over 60% all of our respondents use Slack to communicate internally. It’s rumored that the business comms company is raising $500M right now.
Along with Atlassian’s HipChat, Google Hangouts and Skype for Business, Microsoft Teams and Facebook Workplace is starting to wiggle into the space — so will that extra cash help Slack defend their top spot from rivals?
The responses to our language question fit with this year’s RedMonk’s Programming Language Rankings for the top two spots: Java and Java Script (though they’re switched in this survey). Apparently, the two languages have been dominating since 2010. Who’s gonna unseat them?
I also think Adam Jacob would be pleased to see that Velocity culture is holding true, as it is “open” to all languages.
Shocker — AWS is absolutely crushing the Cloud game. AWS is ahead of Azure by 57.14% points.
But don’t count Microsoft Azure out of the game. On Monday they announced that long-awaited Azure Stack is available in September (allowing their customers to build a version of Azure in their own data centers). Last week they also cut 3,000 jobs “to focus on the growing cloud business.” Additionally, in the first quarter of this year, the product grew 93% year-over-year.
I’m excited to see the healthy competition between AWS and Azure grow in the next few years.
More Monitoring ≠ Better Monitoring
The companies represented at Velocity have spent a lot of money on their monitoring tools (Splunk is in fact the most used tool), yet 67.5% of them said that their biggest monitoring challenge is alert noise / fatigue / volume.
I got the similar results at Monitorama in May, and I’m confident that the next Monitoring Survey I conduct will yield the same. It’s a fact: Most companies I’ve talked to in the last six months at tradeshows, events and in-persons are still most concerned with their alert fatigue.
Bottom line, more monitoring doesn’t necessarily equal better monitoring. Enterprise companies need to focus on having their monitoring tools work with each other, because the alerts and events that these tools are firing off are most likely related.
Maybe if these companies focus on their second most selected challenge — “Alert Correlation across monitoring tools” — then they would start to chip away at their first challenge.