Defining network performance with Google’s 4 golden signals


You’re supposed to meet someone for coffee. If they’re three minutes late, no problem, but if they’re thirty minutes late, its rude. Was the change from “no problem” to “rude” a straight line, or were there steps of increasing rudeness? Do we care why? A good reason certainly increases our tolerance. Someone who is always late reduces it.

Network performance follows many of the same dynamics. We used to talk about outages, but they have become less frequent. “Slow” is the new “out.” But how slow is slow? Do we try to understand the user experience and adjust our performance monitoring to reflect it? Or is the only practical answer to just wait until someone complains?

There was a recent study by Enterprise Management Associates that queried 250 network professionals. One of the questions asked, “what percentage of network performance issues were first reported by end users, rather than discovered by the network operations professionals.” The average answer was 39 percent, and the median answer was 35 percent. So, a third of the time (and much higher in some organizations) we don’t know about an issue until a user complains? We must do better!

The problem isn’t that we don’t get enough reports. Network operations teams are flooded with information, but too much information is little better than noise. We need to be able to condense insight from the vapor of data (to paraphrase Neil Stephenson). But, how do we do that?

Leave a Reply

Your email address will not be published. Required fields are marked *