SRE CON Americas

In the 1990s, I was working in operations at a major North American broadcast facility. Our monitoring ecosystem was very immature and approximately 90% of all service impacts were first identified by our end users.  

The plan was to buy tools, deploy monitoring and become more proactive. But as more monitoring tools were deployed, we started to drown in too many events from these tools. It became too noisy to really discern where the problems existed. By deploying an Event Manager, the noise was reduced and events were more focused, but this was based on writing rules, defining filters, and adjusting the models to represent an accurate topology.