The Internet Protocol now dominates LAN management systems and messaging over the internet. The design of the protocol introduced new thinking into communications systems design. Unlike proprietary systems, the Internet Protocol is designed to work in situations where there is no overall control of the network path. Issues like packet loss take on much more importance because, as the sender, you have to rely on the competence of a chain of intermediaries to get your message through.
Packet loss likelihood
Packet loss is less likely on private, wired networks, but highly probable on long distance internet connections. The IP philosophy of passing data packets across networks gives each router the decision on where a packet should be passed to next. The sending computer has no control over the route that the packet will take.
The reliance on individual routers to make routing decisions means each access point on the route must maintain a database of preferable directions for each ultimate destination. This disconnected strategy works most of the time. However, one router cannot know instantly if another router further down the line is overloaded or defective.
All routers periodically inform their neighboring devices of status conditions. A problem at one point ripples through to recalculations performed in neighboring routers. A traffic block in one router gets notified to all of the routers on the internet, causing all routers to recalibrate paths that would otherwise have passed through the troubled router. The chain of information takes time to propagate.
Sometimes a router will calculate the best path and send a packet down a blocked route. By the time the packet approaches that block, the routers closer to the problem will already know about it and reroute the packet around the defective neighbor. That rerouting can overload alternative routers. If the defect on a router prevents status notifications from being sent out, then the packet will be sent to that router regardless.
In short, the further a packet has to travel, the more routers it will pass through. More routers mean more potential points of failure and a higher likelihood that a packet will be dropped.
Reasons for packet loss
The health of routers on the path of a packet is the main bellwether of packet loss. Router issues fall into three categories:
- Defective routers
- Overloaded routers
- Too many links
Anyone familiar with computers or electronic equipment knows so many different operating factors are involved in computerized hardware that eventually something is bound to go wrong. It is unrealistic to expect that every router in the world will work perfectly all the time forever.
If a packet is sent to a troubled router, it won’t get any further on its journey. That problem could be hardware-related or caused by a bug in the software. The problem may be permanent, a short-term error, or just a blip. All the other routers connected to the defective device will soon notice the problem and stop sending packets to that faulty router. However, a few seconds’ delay will cause hundreds of packets already on their way to be lost.
Network equipment has a throughput capacity. No rule specifies a minimum capacity for any router that operates on the internet. Some can handle a lot of traffic, some can’t. In all cases, however, routers run a buffering system. A sudden surge in demand can still be dealt with even if it exceeds that router’s processing speed.
If the amount of traffic received at a router exceeds the processing speed to the extent that it fills up the buffer, any subsequent packet arriving at the router will not be processed and so will be lost. This situation only goes on for so long because the upstream routers that send packets to the overloaded router also send querying packets every 60 seconds. If one of those packets isn’t get answered, the requesting router will stop sending packets to the overloaded router. In these instances, new traffic will be routed elsewhere. Once the busy router has room in its buffer again, it will send out an availability notice to its neighbors and the traffic will start flowing again.
Too many hops
The network software that sends data packets has a small influence on the journey of a packet, but it isn’t a positive one. The main option that a sender has over the path is the maximum number of hops it should take. This is the “Time to Live” mechanism (TTL) in the IP header of a packet.
Despite its name, TTL doesn’t specify a maximum travel time. Instead, it contains a number that represents the maximum number of routers that the packet should pass through. Each router that the packet passes through reduces the TTL number by one. When a router receives a packet with a TTL of zero, it drops the packet.
Under normal circumstances, the TTL should never expire. However, if a packet is rerouted to get around a defective router, it may end up passing through an unusually high number of points and so expire.
Sometimes a TTL issue is fixed almost instantly, only losing one or two packets in a stream. If a problem on one router endures, however, the routers approaching it will calculate a more efficient workaround. A few packets drop because of TTL, but the rest get through on a newly organized and more efficient route.
Packet loss detection
Two commonly-used network programs can help you identify packet loss: Ping and Traceroute. Both of these use messaging procedures built into a standard TCP/IP protocol called the Internet Control Message Protocol, or ICMP. These command line utilities can be pretty difficult to read, although veteran network administrators can interpret the results.
Fortunately, you don’t have to put up with low-quality presentation anymore thanks to the many GUI-based front-ends that incorporate the standard Ping and Traceroute tools.
For a free Ping and Traceroute utility with a user-friendly interface, take a look at ManageEngine’s Ping/Traceroute/DNS Lookup utility. Two other honorable mentions are the SolarWinds Traceroute NG (free trial) and Visual Traceroute by IPSwitch.
The downside of using Ping and Traceroute is that they only identify ongoing packet loss and route failure. They can’t analyze what happened after a completed transfer, and they won’t fix or prevent the defects that cause packet loss. To find a solution to packet loss, you must first understand a little bit about communication protocols.
Connections and connectionless communications
The Internet Protocol is part of a suite of networking guidelines known as TCP/IP. The name of this stack comes from two standards documents: the Internet Protocol and the Transmission Control Protocol. However, many other protocols are also part of the group. One of the systems in this bundle that didn’t make it into the title is the User Datagram Protocol. TCP and UDP are the two options available for transport management.
TCP establishes settings for a connection before data transfer occurs. UDP doesn’t establish a session between the two communicating computers, and so is known as a “connectionless” system. The effects of packet loss on a transmission differ greatly depending on whether it is managed by TCP or UDP.
Transmission Control Protocol (TCP)
Data packets have two headers. The IP header resides in the outermost layer. Inside that, but still outside of the payload, sits the TCP header. In TCP terminology, a unit of data being processed is not referred to as a packet, but a “segment.”
It is the responsibility of TCP to break streams of data into chunks for transmission. Once a header has been added, the segment is processed into a packet by the Internet Protocol implementation. The TCP function in the recipient device receives the packet with the IP header stripped off. It reads the TCP header and behave accordingly.
The main tasks of TCP are explained in its name: “transmission control.” Responsibilities include segmenting and reassembling streams of data, which involves sequencing each segment so the stream can be correctly reassembled. In order to put the stream back together again, the receiving program must ensure all segments arrive. This accounts for inevitable packet loss. To reassemble the data stream, TCP assembles the segments in order. This requires buffering, which has the added benefit of smoothing out the irregular arrival rate of packets.
A transmission governed by TCP doesn’t suffer any consequences of packet loss. Each packet that arrives is acknowledged. If the sender doesn’t receive an acknowledgement for a packet, it sends the data again. The receiver holds all of a stream’s arriving packets in the buffer. If one segment is missing, the lack of acknowledgement causes the device to wait for its arrival before forwarding the complete stream on to the destination application.
User Datagram Protocol
UDP is the main alternative to TCP. This is a lightweight transport protocol. It has no session establishment procedures, and thus no control procedures. For decades after TCP/IP was defined, no one really used UDP. Just about all internet-based programs employed TCP because of its control and data verification procedures. However, over the past few decades, UDP has suddenly found a purpose and serves many high-tech internet applications.
As with TCP, a UDP data unit sits inside the IP packet. In UDP terminology, the packet is referred to as a “datagram” before it is passed to the IP program. Session-establishment procedures and data integrity checks are not possible with UDP. However, it is possible to specify a port number. The originating and destination port addresses are specified in the UDP header.
UDP suddenly became popular with the advent of high-speed broadband because it doesn’t delay transmission like TCP. Interactive applications like VoIP, video conferencing, and video streaming were all developed to use UDP instead of TCP. Elements of TCP have been replaced by other procedures. For example, the Session Initiation Protocol provides session establishment and ending functions for VoIP applications. Buffering is an example of a TCP function that video systems replicate within the application.
The overall ethos of video and voice programs is to get arriving data up to the application as quickly as possible. The need to check whether packets arrive in order, at a consistent speed, undamaged, or at all overwhelms the need to check the integrity of arriving data.
TCP vs UDP
Given the operating procedures of TCP and UDP, the easiest solution to packet loss over the internet is to use TCP instead on UDP. Unfortunately, the transport procedures of almost all applications (except for specialist networking software) are embedded in programs and the user rarely gets to choose which transport protocol to use.
If you have a program that uses TCP, your connections will encounter packet loss, but you don’t have to worry about it because the protocol will handle data recovery for you.
Programs that employ UDP sacrifice complete data integrity in exchange for speed. Packet loss quality of service impairments are frequent occurrences in voice and video applications over the internet. In fact, they are so common that most people have become used to short gaps or robotic quirks in VoIP conversations, or pauses, jumps, and pixelated frames in live video streams.
Packet loss over the internet
In terms of packet loss caused by the failure or congestion of internet routers, there is no simple remedy. Despite the lack of choice over whether a transfer uses TCP or UDP, however, you can use a trick to enforce transmission control on UDP communications. You can’t switch a UDP program to be a TCP system, but you can wrap UDP packets in TCP procedures.
VPNs establish a secure link between two computers, one of which is the VPN server. The secure link is called a “tunnel” and it uses TCP procedures. Once a tunnel has been created, all traffic between those two computers is sent down it. That means both UDP and TCP transfers are protected by TCP procedures. Some VPNs allow you to switch the tunnel to run over UDP. However, the maintenance of the tunnel by the VPN client and server, emulates TCP protection in the application even if it is running to a UDP port.
Note the path from the VPN server to the final recipient isn’t enclosed in the tunnel. However, two strategies can reduce or eliminate packet loss during that final leg of the journey.
Reduce UDP exposure to packet loss
The easiest way to get near-total TCP coverage for your UDP transfers is to choose a VPN server as close as possible to the remote computer that you are connected to. In some countries, such as the United States, Germany, or the UK, large VPN companies offer servers in several cities. Select a location close to the source of the call or video stream.
Eliminate UDP exposure to packet loss
VPNs allocate a temporary IP address to each client at the point of connection. This new address represents the customer until the session ends. Most VPNs assign a new address each time you connect, but many VPNs offer a “static IP address service.” When using a static IP, the customer is represented by the same IP address every session.
A VPN with a static IP lets you use the VPN-allocated IP address rather than your real address. Whenever you connect to another device over the internet, it will first connect to the VPN server where your static IP address is registered. The route from the VPN server to the client (you) is always protected by the encrypted tunnel.
If you own several sites and you want TCP procedures to cover all of their communications, you can buy a static IP address from a VPN provider for each site.
When all sites are connected to the VPN service, all outgoing messages are protected as far as the VPN server. If those messages are addressed to the remote VPN-allocated address, the remainder of the journey from the VPN to the destination will also be covered by an encrypted tunnel. So, by this double VPN method, TCP procedures apply to the entire length of the connection and automatic packet loss avoidance services cover all of your UDP communications.
Packet loss on private networks
Risk of packet loss on private networks is significantly less than on the internet. However, packet loss does occasionally occur. A problem with your network equipment can raise the loss rate to a critical level. You have one big advantage when trying to eliminate packet loss on your LAN: control all of the links in the network and all of the equipment that process transfers. The surest way to avoid packet loss within your network is to keep tabs on the health of your network equipment.
Fortunately, some very effective network monitoring systems are available today.
Here is our list of the 5 best network monitoring systems:
- SolarWinds Network Performance Monitor (Free trial available here)
- Ipswitch WhatsUp Gold (Free trial available here)
- ManageEngine OpManager
- Paessler PRTG
- Nagios XI
These tools both help you identify the equipment causing packet loss and provide continuous device monitoring to prevent packet loss whenever possible.
The SolarWinds Network Performance Monitor includes an autodiscovery function that maps your entire network. This discovery feature sets up automatically and then recurs permanently, so any changes in your network will be reflected in the tool. The autodiscovery populates a list of network devices and generates a network map.
The monitor tracks the performance of wireless devices and VM systems.
The tool picks up SNMP messages that report on warning conditions in all network devices. You can set capacity warning levels to spot routers and switches nearing capacity. Taking action in these situations helps you head off overcapacity that results in packet loss.
The management console includes a utility called NetPath that shows the links crossed by paths in your network. The data used to create the graphic is continually updated and shows troubled links in red, so you can identify problems immediately. Each router and switch in the route is displayed as a node in the path. When you hover the cursor over a node, it shows the latency and packet loss statistics for that node.
Network Performance Monitor extends its metrics out to nodes on the internet. It can even see inside the networks of service providers, such as Microsoft or Amazon, and report on the nodes within those systems.
NetPath gives great visibility to packet loss problems and lets you immediately identify the cause of the problem. The SNMP controller module lets you adjust the settings on each device remotely, so you can quickly resolve packet loss problems on your network.
If you run your voice system over a data network, you should consider the SolarWinds VoIP and Network Quality Manager. This tool particularly focuses on network conditions important to successful VoIP delivery. As packet loss is a major problem with VoIP, this module hones in on that metric. The system includes a visualization module that shows the paths followed by VoIP, along with the health of each node in color-coded statuses. This tool extends VoIP quality monitoring across sites to cover your entire WAN.
Both of these SolarWinds products run on a common platform and can be integrated together. All SolarWinds infrastructure monitoring systems run on Windows Server. You can get a 30-day free trial for both of these tools.
The Ipswitch product WhatsUp Gold monitors network devices and warns of potential error conditions, including device memory and CPU exhaustion. These alerts are managed via SNMP and will you head off capacity and failure problems that cause packet loss.
This software includes a network discovery feature, that collects all of the data for the monitor. It continually updates the topology of the LAN, detecting inventory additions, relocations, and removals. The discovery process creates a device list and builds a network map. This map is compiled from data gathered at the Data Link and Network layers. The map displays troubled devices in red. The mapping of network links extends out to the Cloud and also includes virtual environments and wireless devices.
Performance metrics like packet loss are shown in the device list and on the network map.
The WhatsUp Gold dashboard provides access to both live and historical data. This performs analysis of traffic demand trends. Live alerts raised when certain conditions are met according to pre-set rules, and you can set your own custom alert conditions. The alerts can be sent out to team members as emails, SMS messages, or Slack notifications.
WhatsUp Gold installs on Windows Server and you can get a free trial.
OpManager features a very sophisticated dashboard that manages to crowd in a lot of information without overwhelming the viewer. You can customize the dashboard and make different versions for different team members. The installation process ends with a network discovery phase, which populates the OpManager system database. The monitor builds a graphical representation of your network that can extend to WANs and wireless equipment. If you have virtual environments, OpManager maps both the virtual and physical elements of your system.
The network monitoring system uses SNMP to continue monitoring the health of all devices on the network. The SNMP system gives device agents the power to send out alert messages called “traps.” The controller displays these alerts on the dashboard immediately and can also be set to issue notifications by email or SMS. This monitoring system helps prevent any emergency performance conditions that cause packet loss.
The alert logging system offers you the easiest way to detect and resolve issues that result in packet loss. One of the alert conditions is packet loss. That alert is tied to a specific network device. On clicking on the notification, the OpManager dashboard takes you to a page about that piece of equipment and shows performance metrics in visual formats. This gives you a quick way to check which condition caused increased packet loss rate.
If no aspect of the router’s performance shows you problems, you can also click through to read the configuration change log. If raised packet loss rate coordinates with a configuration change, you can roll back the settings of the device to its state before those changes to see whether that resolves the problem.
OpManager gives you all the information you need to prevent or resolve packet loss with just a few clicks. This system can be installed on Windows or Linux and is available for a 30-day free trial.
Paessler is a major player in the network monitoring software sector and it puts all of its expertise into one killer product: PRTG. The company prices its product by a count of sensors. A “sensor” is a network or device condition, or a hardware feature. You need to employ three sensors to prevent or resolve packet loss.
The Ping sensor calculates packet loss rate at each device. The Quality of Service sensor checks on packet loss over each link in the network. The third is the Cisco IP SLA sensor that only collects data from Cisco network equipment.
The ongoing system monitoring routines of PRTG head off conditions that cause packet loss. First of all, you need to make sure that no software bugs or hardware failures will cripple the network. PRTG uses SNMP agents to constantly monitor for error conditions on each piece of hardware on the network. Set alert levels at the processing capacity of each network device and marry that to a live monitor of the network’s throughout rate per link. Buildup of traffic in one area of the network may cause overloading on the related switch or router and in turn cause it to drop packets.
The PRTG system monitors application performance, too. You can prevent network overloads if you spot a sudden spike in the traffic generated by one application just by blocking it temporarily. You can also track the source of traffic back to a specific endpoint on the network and block that source to head off overloading.
The dashboard of PRTG includes some great visualizations, which include color-coded dials, charts, graphs, and histograms. The mapping features of PRTG are impressive and offer physical layout views both on the LAN and across a real-world map for WANs. A Map Editor lets you build your own network representations by selecting which layer to display and whether to include the identification of protocols, applications, and endpoints.
Paessler PRTG’s monitoring extends into the Cloud, will enable you to monitor remote sites, and covers wireless devices and virtual environments. You can install PRTG on the Windows operating system or opt to access the system over the internet as a Cloud-based service. Paessler offers a 30-day free trial of PRTG.
5. Nagios XI
Nagios Core is a free and open-source program. The only problem is that no user interface is included. In order to get full GUI controls, you must pay for the Nagios XI system.
Like all of the other recommendations on this list, Nagios XI discovers all of the devices connected to your network and lists them in the dashboard. It will also generate a map of your network. Ongoing status check head off potential packet loss-provoking performance problems.
Statuses are checked by the proprietary Nagios Core 4 monitoring system rather than SNMP. However, Nagios can be extended by free plug-ins, and an SNMP-driven monitoring system is available in the plug-in library. Traffic throughput rates, CPU activity, and memory utilization appear as statuses on the dashboard include. By setting alert levels on these attributes, you can get sufficient warning to prevent overloading of each of your network devices.
A Configuration Management module checks the setup of each device on the network and logs it. The log records changes made to those configurations. If a new setting impacts performance, such as increased packet loss, you can use the Configuration Manager to instantly roll back settings on a device to an earlier configuration.
The dashboard of Nagios XI includes some very attractive visualizations with color-coded graphs, charts, and dials. You can customize the dashboard and create versions for different team members as well as non-technical managers who need to stay informed.
The Nagios XI package includes all the widgets needed to assemble a custom dashboard through a drag-and-drop interface. The system comes with standard reports and you can even build your own custom output.
Nagios records and stores performance data, so you can operate the interface’s analysis tools to replay traffic events under different scenarios. The capacity planning features of this system will help spot potential overloading that would cause packet loss.
Nagios XI will cover virtual systems, cloud services, remote sites, and wireless systems as well as traditional wired LANs. You can only install this monitor on CentOS and RHEL Linux. If you don’t have those but do have VMware or Hyper-V machines, you can install it there. Nagios XI is available for a 60-day free trial.
Packet loss considerations
You will never reach a point where your company’s network infrastructure achieves zero packet loss. You should expect this performance drag when making connections over the internet, in particular.
Once you understand the reasons for packet loss, keeping the network healthy becomes an easier task. Install a network monitor to prevent equipment failure and system overloading that escalates packet loss to critical conditions.
Packet loss costs your business money because it causes extra traffic. If you don’t deal with packet loss, you’ll have to compensate by purchasing extra infrastructure and higher levels of internet bandwidth than you would need with a well-tuned system.
Being able to easily remedy unforeseen buildup in packet loss will greatly assist you in performing your job well. Although the tools on this list are a little pricey, they pay for themselves in the long run through productivity increases and lower bandwidth requirements.
Fortunately, all of those expensive tools we outlined above are available for free trials. Check out a few to see which gives you the best opportunity to prevent packet loss.
Have you experienced overwhelming levels of packet loss that impacted your network performance? Do you find that overloading occurs frequently on your network? Which tools do you use to monitor your network and prevent packet loss? Leave a message about your experience in the comments section below, and help others in the community learn from your experience.