Perception is reality.
Or rather – perspective is reality.
When beginning to troubleshoot an application or network performance problem, we often start by capturing packets at or near the client end. We do this for several reasons:
There is less traffic to dig through
Client dependencies can be seen rather than assumed
The analyst can get end user perspective into the problem
There are several more reasons but we will keep it simple and leave it at those three.
On the other hand, there are several downsides to capturing only on the client end. In fact, now more than ever, network devices change, modify, or remove header values in packets as they travel on their way to a server. This means that when a packet is received by a server, we can no longer assume that it was captured with the same header values that it was sent with (we’re not just talking about TTL and NAT here).
To get the wireline of truth an application problem, the ideal way to capture it is by collecting packets on both sides, client and server. If the server is in a cloud environment, or we cannot get access to the data center where it is hosted, then we should capture at the furthest point of visibility that we have access to.
There are several advantages to analyzing with both sides of the picture. This method of capture allows us to:
Detect which way network packet loss is occurring
Determine if the network is adjusting TCP MSS, and if the both directions agree
Determine if the network is adjusting IP identification numbers or TCP sequence numbers
Measure true network latency
Observe server dependencies (back-end app or DB servers)
There are several other benefits, but these are the big ones. Recently, I have worked with companies who have been dealing with one-sided MSS configuration changes, leading to large packet drops in one direction. Without dual-sided capture, these issues would have been much more difficult to see.
How to do it
As analysts, truth is important. We never want to capture traffic physically on the client or server that are experiencing the problem with performance. The Wireshark driver will capture packets before they have been fully handled by the TCP/IP stack and NIC, giving us false packet information than is really on the wire.
For example, when capturing client side, we may see packets that are 60,000 bytes in size. This is because we are capturing data before the client TCP/IP stack has had a chance to break it up into smaller segments.
For several reasons then, only capture with hardware that is external to the client or server under test. Next, with the capture devices and SPAN ports/taps in place, begin the capture and produce the issue. After, stop both sides and collect the traces.
Now, for many of us, those last couple of steps may cause a chuckle. After all, not all issues are re-producible on-demand. Many application problems are intermittent, and capturing it in the act isn’t always easy. In these cases, we may need to make use of a ring buffer configuration in Wireshark to capture packets to disk, while waiting for the problem to occur. In the next Wireshark tip, we will talk about how to do that.
In the meantime, practice capturing traffic on both sides of a problem. Use IP identification numbers or TCP sequence numbers (not relative sequence but the real one) to locate a packet that was sent client side and received server side – unless a network device has mucked with one of these values! Note any changes in the header values made by the network.
Take time to implement and practice this capture technique before problems strike. It will save a ton of time in analyzing traces once a performance problem does come along.
If you have questions about how to do this in your environment, or need help in analyzing an application performance problem - get in touch!