Which is Better for Troubleshooting SaaS Performance Issues: Packet or Number Crunching
Most enterprise IT professionals and network performance management (NPM) tool vendors consider it mandatory to have the original packets for analysis when solving difficult network and application degradation issues. That’s because visibility into network, TCP transport, and application performance is essential, especially when web, SaaS applications, and hybrid cloud infrastructure are involved.
While there are certainly cases where you need the original packets, there may be cases where sifting through large volumes of packets would impair rapid problem isolation and resolution time.
That is why it is our contention, and based on real customer scenarios, that insights offered by automated “number crunching” wire-data analytics trumps terabytes of stored packets.
Let’s examine further. Figure 1:
Achieving fast MTTR in complex IT environments: Number vs. packet crunching?
Handling slow cloud applications
Here’s a situation we encountered at one client that had a large number of branch offices. Users at many branch offices reported slow application performance, mostly around three cloud applications. From initial analysis, we discovered that the performance degradations occurred between 7 and 11 AM and every three to five working days.
The network team indicated this was not a network issue. The branch location suffering the most from these degradations had a fully meshed 10 Gbps backbone with a redundant 1 Gbps internet connection. Additionally, the Internet connection was running in active-standby mode with an average utilization of 10%.
If we used a traditional packet-crunching approach, this would require a troubleshooting kit that stores packets for several days, consuming 8TB of storage based on the average Internet connection utilization. We knew there would be a number of key challenges:
The troubleshooting team would need to undertake a time-consuming cross-check of the three cloud applications with other applications to evaluate performance.
Beyond the initial report noting performance degradations, there would be no expedited processes to analyze the packets by looking into the network, front-end server, application, or client.
Even if the network remained suspect, there may be no process to eliminate any network dependency issues such as the impact of switching back and forth between two Internet connections. In fact, if we took this approach to analyze network dependencies (see Figure 2), the process would have required more than 16TB of storage and possibly double the resolution time.
Figure 2 - Analyzing the impact of firewalls, IDS/IPS and load balancers
From packet to number crunching: The better path to resolution
As highlighted above, a traditional packet-crunching analysis would be time-consuming and ineffective. That’s why the team took a number-cruncher approach as it would:
Generate metadata including tracking as well as analyzing and storing all TCP flags and anomalies against sequence numbers
Perform real-time and historical analysis related to specific time slots, user locations, and applications
Expedite cross-checking capabilities leveraging all application and user metadata
Rule out network-related dependencies by leveraging the troubleshooting kit’s network ports
Number crunching: The proof is in the numbers
Let’s return to that branch office scenario we talked about at this beginning of this blog. By using a solution with automated number-cruncher analysis on the problematic Internet connection, after three weeks, the team would have stored 21 days of per-minute layer 2-7 wire analytics in just 346 GB of disk space (see Figure 3). This would be significantly less than the 8 TB of storage space potentially used by a packet-cruncher solution.
Figure 3 – The database size after three weeks of number-crunching, automated wire-data analytics
And based on this same three-week period, the system reported up to 4 Gbps of traffic (where most was classified as TLS). Through number-crunching, less than one minute was needed to process packets related to 307 million flows (see Figure 4).
Figure 4 – The power of number-crunching
From the scenario above, you can see how the number-cruncher has processed hundreds of millions of packets in minutes, something not possible with a traditional packet-cruncher approach. For a first-line help desk, imagine the gains in productivity when troubleshooting reports come in every day or every couple of days. Analysis time could be reduced by significant margins.
The SkyLIGHT PVX experts have found, from years of experience, that rarely do you need the real packets. Instead, by relying on number-cruncher analysis tools, help desk professionals can move away from manual or semi-automated packet-crunching to resolve troubleshooting issues. Because as we have seen above, the numbers tell all you all that you need to know.
Will Moonen is an experienced, results-driven consultant with a proven track record in improving the performance of IT processes, applications, and infrastructure while keeping an open mind for human aspects.
Accedian is a leading provider of application performance management (APM) and network performance management (NPM) solutions, Accedian (Performance Vision) delivers exceptional end-to-end network and application performance visibility for control over the best possible user experience. Accedian is an established expert at instrumenting networks of every size, with SkyLIGHT™ platform solutions that scale to monitor multinational enterprise and service provider networks More than 250 enterprise customers count on SkyLIGHT PVX for their application and network performance management needs. Since 2005, Accedian has partnered with its customers to deliver solutions around the globe, helping them and their users Experience Performance.