Sam the network administrator is heading out to top off the morning’s first cup of coffee when there’s a cellphone chirp. Sam pivots back toward the desk. The text message announces there’s a spike in traffic on one of the organization’s WAN links; the anomaly triggered an alert that Sam had configured into the network traffic monitoring and analysis tooling. Sitting down, Sam opens up the analyzer’s dashboard and skims across the pie charts, graphs, and other indicators that summarize current and historical traffic across the organization’s network. With a few clicks Sam drills down into the data, zeroing in on the root cause of the traffic spike. Sam will be able to correct this situation before the first plaintive “Why is the network so slow?” calls start to trickle in. Then, coffee.
A modern organization depends on its sophisticated high-performance network, and so traffic monitoring and analysis have become everyday duties. On any given day, your smoothly-flowing network might be roiled by failing hardware, badly configured systems, bandwidth hogs, distributed denial of service attacks, phishing campaigns, botnets, and congestion due to social media and non-business apps.
Monitoring the network and analyzing traffic provides early warnings of anomalies, as well as the tools to diagnose and troubleshoot reliability and performance issues, including congestion problems, hardware glitches, and security incidents. Monitoring and analysis also provides the insight needed for long-range network planning.
The detailed packet sniffing provided by old standby tools such as tcpdump and Wireshark is still useful for diagnosing and troubleshooting; but traffic monitoring and analysis also needs the “30 thousand foot view” that shows the overall state of the network and ongoing trends. That kind of visibility is possible because of technologies for accumulating metadata about traffic flows. In sophisticated networks, devices such as routers and switches assist with monitoring by serving as exporters, transmitting information about the traffic passing through them to central collectors which receive, store, and preprocess the traffic data for analysis and report generation by analyzer tools.
There are two major approaches to accumulating traffic flow data, NetFlow and sFlow. Both approaches are “under the hood” technologies, not directly visible. Both NetFlow and sFlow have two components. One piece resides deep inside a variety of different devices (e.g., routers and switches); the other piece, the human-facing part, lives in a wide range of different monitoring/analysis applications that process the data exported from the first part.
Both approaches have their partisans. We’ll look at what they are, how they differ, and when one is better than the other.
NetFlow was originally a feature of Cisco routers aimed at optimizing packet switching and ACL processing, and was found valuable for traffic monitoring. The successor to the NetFlow protocol is being standardized by the IETF as IPFIX (Internet Protocol Flow Information eXport). Several non-Cisco vendors also support NetFlow.
NetFlow notes and reports on all IP (Internet Protocol) conversations passing through an interface. (At least, “all” was true originally; but see below about Sampled NetFlow.) NetFlow is stateful and works in terms of the abstraction called a flow: that is, a sequence of packets that constitutes a conversation between a source and a destination, analogous to a call or connection.
A NetFlow exporter device collects data on the IP traffic entering/exiting the device; it inspects packets and groups them into flows by inspecting particular fields: the source and destination addresses, protocols, ports, etc. Data on observed flows is rolled up from the packets and cached locally (in the flow cache), then it’s periodically exported to the collector based on active and inactive timeouts. NetFlow thus only handles IP, focusing on OSI Layers 3 and 4. Its knowledge of the IP protocols enables it to interpret packets and work in terms of flows.
In contrast, sFlow is a stateless packet sampling protocol that’s aimed at monitoring high speed networks. The “s” in the name is significant: sampling. However, the “Flow” part may be misleading: sFlow works in terms of packets only, it has no notion of aggregating packets into higher-level “flows”. The sFlow.org consortium maintains the sFlow standard, and many vendors support sFlow in their devices.
sFlow provides general purpose packet sampling, spanning Layers 2 through 7, and is designed to be built into any network device. An sFlow exporter simply collects the prefixes of a subset of the packets passing through the device. The exporter samples one out of every n packets, where “n” is the chosen sampling rate; it also selects some random packets to include. It gathers the initial bytes of all sampled packet into sFlow datagrams, along with device counters, and sends the resulting UDP datagrams to the collector. There is thus no flow cache at the device. A key characteristic of sFlow is that the strategy of sampling is scalable to high speed networks; more on that below.
Both NetFlow and sFlow have acquired extensions over time. Flexible NetFlow and IPFIX provide the ability to have vendor-extensible templates for tweaking the set of packet fields of interest. NetFlow v9 and IPFIX also add the ability to monitor Layer 2 fields. Sampled NetFlow adds the option of doing sampling to NetFlow (sampling is mandatory in sFlow). For sFlow, v5 adds the ability to export host and application related data along with the packet prefixes and counters. All extensions depend on having hardware that supports them, the correct system software, and analyzer consoles that will work with them.
Contrasting NetFlow and sFlow
Avi Freedman makes an apt analogy to monitoring vehicular traffic: “… while NetFlow can be described as observing traffic patterns (‘How many buses went from here to there?’), with sFlow you’re just taking snapshots of whatever cars or buses happen to be going by at that particular moment.”
Here are the main differences between the two technologies.
Accuracy and scalability
NetFlow’s partisans have long argued that NetFlow can be more accurate than sFlow. NetFlow aggregates data about all packets into flows locally at the device; thus it can’t by happenstance miss a conversation by failing to sample the relevant packets. This granularity of NetFlow is attractive for examining traffic with an individual host. It’s easy to see per-host details, notice localized anomalies, and investigate particular flows. But as traffic volume mushrooms, it becomes less and less feasible to collect every flow. If you’re not doing sampling, scalability becomes an issue.
sFlow is thus more scalable than traditional NetFlow. However, sampling has the downside that there may be gaps in visibility. The packets sampled may not reflect every flow (for instance, short bursts). For detecting and drilling down to investigate security issues, this can be significant.
Device performance at high volumes
As noted above, sFlow does minimal work on the network device, versus NetFlow which uses the device’s CPU and RAM to implement the flow cache. This can become a problem with high speed devices where many conversations are concentrated onto a link. The additional CPU load on top of the “real work” the device is doing increases based on the number of flows per second, and can consume a significant fraction of the CPU per a Cisco whitepaper (PDF). In contrast, sFlow generally does its packet sampling in the switching/routing ASIC, letting the network device’s CPU concentrate on its core job.
At volumes of hundreds of gigabits per second, such as in edge routing and large datacenters, traffic engineering becomes the central concern; the focus is on large-scale patterns and abrupt shifts in volume. Fine-grained visibility into individual hosts becomes less significant. Now sampling starts to become the clear winner. Because of this, NetFlow has added the option of Sampled NetFlow, which makes NetFlow scalable — but loses that accurate high granularity of traditional NetFlow.
NetFlow is IP only (with some Layer 2 support added recently). Thus legacy protocols (e.g., Appletalk, IPX) and other non-Internet protocols do not show up. In contrast, sFlow can cover Layers 2 through 7.
sFlow can have lower latency than NetFlow. A device collecting NetFlow metrics in its flow cache exports them periodically based on active and inactive timeouts. Thus reports on recent and ongoing conversations may be delayed, depending on the timeouts. In contrast, sFlow sends collected packet prefixes and counters in real time. If sub-minute latency is a concern — and your monitoring/analysis tooling supports it — sFlow may be the better choice.
There are several factors to consider in choosing either NetFlow or sFlow. Your installed hardware base may make the decision for you; what do your devices support? Consider also your traffic volume versus your need for fine-grained visibility, and the range of protocols on your network. Happily, many enterprises can use both technologies, using each one for what it’s best at, and using analyzer tooling that merges the data into a single view.
“Azaleos NOC” by Azaleos, Wikimedia Commons, licensed under CC BY-SA 3.0