My Life with Recurring Gray Problems
When Denny and Tim invited me to blog on LMT my first thought was how do I get started? I've got plenty of ideas about subjects but how do I set the scene? Then I realized that the common theme for me and my colleagues at Advance7 is the Recurring Gray Problem. You may not recognize the term, but I guarantee that you've seen the problem.
In this context the word 'Problem' means poor performance, errors or incorrect output - no surprises there. Recurring - well obviously it means it keeps happening. Gray just means that we don't know which technology is causing the problem, and so it's impossible to allocate it to the correct support team.
So the problem just bounces from one team to another as each "proves" that their technology is not to blame. Of course, it must be the network, right?
For the past 25 years we've specialized in finding the root cause of Recurring Gray Problems. We look across the whole end-to-end system to narrow the problem to, say, a box, and our favorite tool to do this is most definitely Wireshark. Once we are down to a box we'll drill into it with other tools, but you can achieve a lot with Wireshark and Excel.
It's interesting that, although we use Wireshark every day of the week, we hardly ever spend time analyzing the Ethernet or IP layers. We are mostly interested in the TCP and application layer protocols, and we spend a lot of time correlating pcap traces with other types of diagnostic data. It's this end-to-end perspective that I'm hoping to bring to the LMT party.
We've fallen down many holes along the way, but we've learned a lot too, and I'm keen to share our experiences with you.