I really get excited when I am able to reproduce problems in the lab.
With this specific case, the customer was experiencing errors within their web browsers that looked like either a network or server issue. The specific symptom was that certain images would not display. If you waited a while, and ‘refreshed’ the page, more of it loaded or the entire page loaded properly.
I’m sure you can imagine the chaos this type of intermittent problem causes. The sequence of events unfolds in the following manner; the client reports the webpage issue to the help desk and the help desk tests the webpage with mixed results. In either event, the problem goes to the server group who tests and finds nothing wrong, and then the problem goes to the network group which, in most cases, does not see the problem. Then the political fist fights, finger pointing and witch hunt commence…..
In this case, they even managed to capture some packets during the problem and saw a HTTP “Service Unavailable” message and were having issues interpreting exactly what that would mean. I was there doing some other work when they dumped, uh, I mean asked me if I could help.
They explained that when the problem was occurring, the network management system was not reporting that the server or application was down. I asked how they knew that and they said that they pinged the server, tested for tcp port 80 and lastly retrieved the html page. Wow, I was impressed. I don’t see too many people monitoring from the IP layer up to the Application layer.
I then told them that even though this was an excellent way of monitoring, I wasn’t too surprised that no outages were recorded. If it was an application issue, the pings will still work as well the TCP port check. If all you did was retrieve a single html file, it would not use the same number of connections as actually loading a page and rendering images, etc…
That’s when the lab work came in. I went to my lab and configured IIS to only accept 1 connection, created a simple html file which had a few images on it. After the first try I saw the exact same issue the client experienced as well as the same HTTP message in the analyzer.
In the video below you will see how I did it and the results.