Common Monitoring Mistakes

Alert Overload

If you have too many alerts that constantly go off, people will eventually ignore them and think they are not critical. However, when service impacting alerts comes, they will be ignored as well and treated as false alerts.

  • Distinguish between warnings (that admins should be aware of, but do not require immediate actions) and error or critical level alerts that require immediate attention.
  • Route the right set of alerts to the right group of people.
  • Make sure every alert should be real and meaningful.
  • Ensure alerts are acknowledged, completed, and cleared. Do not have a monitoring system dashboard with hundreds of alerts.
  • Optimize alert settings in monitoring, or the systems, or operational processes that can reduce the frequency of these alerts.

Multiple Monitoring Systems

Streamline all your alerts into one monitoring system. Having one monitoring system for each type of devices such as Window, Linux, SQL servers, and so on will impact datacenter performance. Alerts could end up being routed incorrectly,and not being addressed at all.

It is unavoidable that even with good monitoring practices in place, issues and outages will occur. Best practices dictate that the issue should not be considered resolved until monitoring is in place to alert what caused the issue or outage to prevent future similar events.

Hardware Vendor Independent Architecture

You should not be forced to select a monitoring system based on the equipment manufacturer and vice versa.

Not Monitoring your Monitoring System

Lot of time is spent on setting up the monitoring system, but not monitoring the actual monitoring system itself. This means if there is an issue in the monitoring system – a hard drive or memory failure, a network outage, or power failure – they don’t know about it. If you have a monitoring system that’s down, and you’re not aware of that, you’re exposing the business to increased chance of an undetected outage. To minimize this risk, set up a check of your monitoring in a location outside of where you monitoring is. Or go with a monitoring solution that’s not only hosted in a separate location.