Recently, I changed my internet service to a new provider. These guys promised some decent bandwidth at a good price.
After getting it installed, nerd-proofed, and monitored, all looked good.
That is until I had a remote training class to teach over WebEx – then the audio issues began. (Of course! Why do these issues always KNOW?!?! Right when you need the connection the most – boom!)
Every 10-15 minutes or so the audio would drop for about 5 seconds. The students could still see my screen, but the audio was clearly having issues. Fortunately, I could call in using my cell and finish up the class with no further problems.
As much as I wanted to blame WebEx, I knew that it was no small coincidence that I had just changed my internet service. Alas, this time it looked like it really was the network! Time to crack out the tools and troubleshoot.
Packet Capture - Wireshark
As a packet person, my first line of defense is typically to use Wireshark. This goes directly in the face of what I teach in my troubleshooting classes, where I usually steer people to SNMP and flow-based tools first. I mean, digging in the weeds is hard when you are first learning all this stuff, right? But, if you know what you are looking for, packets can steer next steps.
Wireshark showed me that during the periods of “outage” ingress traffic came to a screeching halt. I saw plenty of packets leaving the network, but nothing coming back in. No ARP or ICMP behavior, only retransmission attempts from my machine for previously transmitted packets. My point of capture could have been a problem since I was not capturing on the external connection of my router. My tap wasn’t at my side (DOH!!) so I couldn’t capture immediately outside my router to see what was happening between me and the ISP box. As soon as I get my tap back you can imagine where it will be connected.
Ping Tool – PingPlotter
I needed to see if this was a “me” issue or a “them” issue, so I fired up a handy little tool that I use to measure and graph ping responses. It does a traceroute alongside a ping test to any configured target, allowing a ping response time to be graphed at every hop along the way (where ICMP is enabled of course). This tool showed me some interesting results.
In the screenshot above, the ping tool is graphing ping responses, packet loss, and jitter measurements for each tier of the route. At one point – right when the audio would drop – I saw the fat red line at all tiers outside my office router. This meant that all pings were lost between my router and the ISP. The connection between my machine and my office router was clean and responsive.
However, between my router and its gateway in the clouds, there was a total outage for about 15 seconds.
That’s not cool man.
Unfortunately, this was not an isolated issue. It happened often enough to impact the WebEx session and affected both the audio and the video feeds. It was just more apparent on the audio portion.
What to do next?
Since the physical connection between my router and the ISP fiber box never went out, this did not appear to be a layer 1 issue on my end. Perhaps further up the connection on their end, but at least not between me and the fiber box. I couldn’t get any further access to the router one hop in via SNMP or any other method, so my visibility was locked out there.
I’m going to give the ISP a ring and let them know what I’ve found so far. Hopefully I’ll get an ear that can do more than have me simply reboot the router – arg. We’ll see. But isn’t it nice to have great tools that you can use to prove where the issue is isolated? Worst case, I’ll head on back to my other provider. At least I will have data to back up my departure!