Data De-duplication – What and Why

The Oldcommguy
Apr 21, 2022
3 min read

Duplicate packets of monitoring data can come from several sources, including the use of SPAN ports and the geographic location of data captures. For instance, a normally configured SPAN port (which is frequently used to connect monitoring tools to the network) can generate multiple copies of the same packet and transfer that data to the security and monitoring tools. These copies are exact duplicates of the original packet. Even when optimally configured, a SPAN port may generate between one and four copies of a packet and the duplicate packets can represent as much as 50% of the network traffic being sent to a monitoring tool.

It also matters where you capture monitoring data, as this can create duplicate data as well. If you capture the data at the ingress to the network and then again in the core, you have probably copied the same data twice. This double capture is in addition to whatever duplicates were made by the core switches themselves.

Purpose of De-duplication

Why does it matter? Elimination of unnecessary packets from the data inspection process will improve the capacity of data that your monitoring tools can process. A 50% reduction in the amount of data that needs to be processed is a significant reduction. This can result in a sizable cost savings for your network, as you should be able to decrease the number of unnecessary tools that you need.

So how do you accomplish this? Advanced context-aware data processing features, like de-duplication, within a network packet broker (NPB) can remove these duplicate packets. The NPB is capable of removing duplicate packets at full line rate before forwarding the traffic to the monitoring tools. Multiple copies are simply dropped from the data stream, with no negative effect on the tools. A large de-duplication window and the ability to configure the window size within the NPB makes the de-duplication feature extremely powerful. Based upon Keysight customer research, tool efficiency increases of up to 30 to 50% improvement have been seen when an NPB is used to perform de-duplication.

The de-duplication process is literally as simple as deleting the unnecessary copies of the packet data:

Typical Use Cases

As mentioned earlier, the most common de-duplication use case is to filter out unnecessary copies of packet data when SPAN ports are used in the network. This reduces the load on security and monitoring tools.

A second use case is for Cisco Application Centric Infrastructure (ACI) architectures. Redundant traffic streams and a distributed leaf and spine architecture means that you have to tap in multiple places to collect all of the monitoring data needed in this architecture. This creates a significant amount of duplicate data that needs to be removed from the monitoring stream. To complicate matters, leaf portions of the networks are running at 40 Gbps and the spine portions typically run at 100 Gbps. Removal of 40 Gbps duplicate packets can be very expensive, if this is performed by the monitoring tool at line rate instead of by an NPB.

A third use case is to actually turn off de-duplication periodically to perform a network analysis. Once the function is turned off, it can be observed how duplicate traffic is created and from where. This alerts you to probable network errors – either a poor network design that is creating duplicate traffic or equipment that is potentially failing and generating duplicate packets in error.

Considerations

Here are some things to keep in mind when considering de-duplication solutions.

Why not just buy a monitoring tool that has de-duplication built into it and skip the NPB? – Some monitoring tools can definitely perform this function as well. The issue with the tool performing this function is that you are now spending tool CPU resources and time to perform this function. This slows down the processing capability of the tool and might even necessitate buying another tool to handle the extra load. Since monitoring tools are often expensive, this can become a costly choice. A packet broker is usually a much more cost-effective alternative since it is purpose-built for these types of functions.

Packet brokers can perform de-duplication cost-effectively at line speeds – Since an NPB is purpose-built for de-duplication, a properly built NPB can perform de-duplication at line rate up to 100 Gbps. Only a few of the monitoring tools on the market can even handle this capability at 40 Gbps, or higher, as this places a heavy burden on the CPU. So, verify that your NPB selection can truly perform the functions needed at full load. This means you’ll have to test the vendor’s system (and not rely on whatever the vendor says).

If you want more information on network visibility, visit www.getnetworkvisibility.com.

Data De-duplication – What and Why

Recent Posts

Join our mailing list