Wednesday, June 15, 2016

The Accuracy of Sampled Netflow

 on  with No comments 
In , ,  
To alleviate the fear of overburdening the CPU due to the collection of NetFlow statistics, Cisco gives us the option of using Sampled NetFlow. Sampled NetFlow allows you to sample 1 out of 10 packets, 1 out of 100 packets, or however much of a subset of the total number of packets. The theory is that with a good sample, the traffic will still be indicative of what is flowing through the router. If 10% of the total amount of packets following through the router is DNS queries, for example, then approximately 10% of the total amount of packets in the sample will also be DNS queries, and so on.

The reason that this is necessary is because of the way that a router handles traffic when collecting NetFlow statistics. In order to process a packet in order to collect NetFlow statistics, that packet has to be processed by the CPU. When sampling is enabled, the packets that are not part of the sample are switched faster because they will not require the additional processing required.

NetFlow sampling is enabled on supported IOS platforms with just a few commands.

ip route-cache flow sampled
ip flow-sampling-mode packet-interval 100

NetFlow sampling can be monitored with the show ip flow sampling command.

So as you can see, NetFlow sampling is simple to configure and monitor. It only takes a couple commands. But the question now needs to be asked, how indicative of the total network traffic is the sample? In other words, if I’m seeing 10% of all traffic being DNS queries in my sample, is 10% of the total traffic flowing through this router really DNS queries? Or is there some significant level of error in the sampling? In Cisco documentation and Certification Exam Guides, it is admitted that the sample will never be 100% accurate, but that it should usually be pretty close. They’ll also mention that you should obviously check the accuracy periodically.

Recently, I came across an academic article talking about the accuracy of NetFlow sampling. In the article, they collect data over time with a 1 in 250 packet NetFlow sample and compared it to a raw traffic sniff utilizing tcpdump. Shown below is Figure 8 from the article, which summarizes their findings. The red dotted line shows real time data of traffic flowing through the router, while the solid blue line shows real time data of their 1 in 250 NetFlow sample.

The article states that "In Figure 8 the cumulative empirical probability is plotted with its relative error. It indicates that the performance of systematic and static random sampling is not distinguishable in practice. We believe it is true in most of backbone links where the degree of multiplexing of flows is high."  In other words, the sample is really indistinguishable from the full data set.  Equally of importance, they found that the processing overhead of NetFlow sampling to be insignificant.  Further accuracy of their collection methodologies is demonstrated by SNMP byte count data strongly correlating with NetFlow byte count data.  There's a lot of statistics and graphs in the article if you're into that sort of thing.

Conversely, in another academic article, the researchers found their sampling to be significantly less accurate.  They stated that "Our experimental results allow us to come to the conclusion that: (i) our traffic classification method can achieve similar accuracy than previous packet-based techniques, but using only the limited set of features provided by NetFlow, and (ii) the impact of packet sampling on the classification accuracy of supervised learning methods is severe."  They discuss a training process which gets their accuracy to 85% for a 1/100 sampling.  Good enough for most use cases, but still too manual and still still a far cry from the results of the first study.

So where do we stand with Sampled NetFlow accuracy?  One study says it's pretty accurate, and the other says not so much.  So the jury is still out, and we're back to Cisco's recommendation that you should be testing the accuracy to determine if it is good enough for your use case.  Like the team in the first article, you can easily use a network tap or SPAN port to compare what is actually coming out of a router interface with the NetFlow sample estimating what is coming out of that router interface.  Don't just assume.


Post a Comment

Discuss this post!