The slowly growing interest in Cloud Computing that started ten or so years ago is turning into a stampede. Most of our customers at Advance7 have strategic plans to migrate many systems to a cloud platform, and many have already started the journey.
In fact, we too have migrated all of our systems into AWS and Azure, containerizing many of them in the process. But here's a concern we shared with our customers:
"Will we have enough visibility to troubleshoot performance and stability problems once we have migrated our systems?"
It's a good question. We don't want to discover that the whole environment is opaque, just when we need to troubleshoot a serious problem. We satisfied ourselves that we could get the data we needed to maintain our systems. We found that we could get a lot of information from the Application Load Balancers, and we configured continuous packet captures to record traffic between the tiers of our systems. Just as well as a couple of months ago we hit a performance problem with the TribeLab Community website.
I managed to record the actions of our Performance & Stability Engineers as they used AWS CloudWatch and Wireshark to investigate the problem. I pulled together screenshots, video clips and other information to produce a short video case study and that's what I'm presenting here. It tells the tale of how our PSEs used freely available tools to troubleshoot the problem. You can find the video here:
On this page we also provide links to useful cloud performance information.
Application Performance Monitor (APM) tools are very powerful, and common in a Cloud-native application environment. It was interesting to note that APM wouldn't have helped in this case, even if it had been available. You still need other data and, even in a cloud, packet analysis continues to be a powerful way to undertake low-level troubleshooting.