As your organization grows, so does the number of servers, devices, and services that you depend on. Monitoring the activity, capacity, and health of hosts and applications, both on-premise and in the cloud, is the focus of system monitoring. Here we’ll look at 6 sophisticated system monitoring packages for Windows and Linux.
The term “system” covers all of the computing resources of your organization. Each element in the system relies on underlying services, or provides services to components that are closer to the user. In networking, it is typical to think of a system as a layered stack. User software sits at the top of the stack and system applications and services on the next layer down. Beneath the services and applications, you will encounter operating systems and firmware. The performance of software elements needs to be monitored as an application stack.
Users will notice performance problems with the software that they use, but those problems rarely arise within that software. All layers of the application stack need to be examined to find the root cause of performance issues. You need to head off problems before they occur. Monitoring tools help you spot errors and service failures before they start to impact users.
The system stack continues on below the software. Hardware issues can be prevented through monitoring. You will need to monitor servers, network devices, interface performance and network link capacity. You need to monitor many different types of interacting system elements in order to keep your IT services running smoothly. Here we’ll look at six sophisticated system monitoring packages for Windows and Linux.
Why system monitoring
Knowing whether a computer has issues is fairly straightforward when the computer is right in front of you. (Knowing what’s causing the problem? That’s harder.)
But a computer sitting by itself is not as useful as it could be. Even the smallest small-office/home-office network has multiple nodes: laptops, desktops, tablets, WiFi access points, internet gateway, smartphones, file servers and/or media servers, printers, and so on. Any one of those might start behaving badly and could cause issues for the others.
You most likely rely on off-premises servers and services, too. Even a personal website raises the nagging question, “Is my site still up?” And when your ISP has problems, your local network’s usefulness suffers. Organizations rely more and more on servers and services hosted in the cloud: SaaS applications (email, office apps, business packages, etc); file storage; cloud hosting for your own databases and apps; and so on.
Bandwidth monitoring tools and NetFlow- and sFlow-based traffic analyzers help you stay aware of the activity, capacity, and health of your network. They allow you to watch traffic as it flows through routers and switches, or arrives at and leaves hosts.
But what of the hosts on your network, their hardware, and the services and applications running there? Monitoring the activity, capacity, and health of hosts and applications is the focus of system monitoring.
System monitoring aims
In order to keep your system fit for purpose, your monitoring activities need to cover the following priorities:
- Acceptable delivery speeds
- Constant availability
- Preventative maintenance
- Software version monitoring and patching
- Intrusion detection
- Data integrity
- Security monitoring
- Attack mitigation
- Virus prevention and detection
Lack of funding may cause you to compromise on monitoring completeness. The expense of monitoring can be justified because of it:
- reduces user/customer support costs
- prevents loss of income caused by system outages or attack vulnerability
- prevents data leakage leading to litigation
- prevents hardware damage and loss of business-critical data
Expense on system monitoring reduces costs in other areas of the IT budget.
Basic system monitoring tools
Anyone who’s curious about their workstation or laptop’s performance has likely encountered Windows Task Manager or Linux’s ps and top. (The more experienced know of Sysinternals on Windows and htop, atop, pgrep, and pstree on Linux.)
Task Manager is a good example of the basic information you can learn about a host, starting with what processes are running and which currently consume the most resources.
Climb up a level and it will show you current and recent utilization for key resources like CPU, memory, disk, and network connections. Other tabs will show you more details on running processes, operating system services, and other key data.
Unix and Linux have analogous tools, like top.
Task Manager and top provide a continuously updating display of utilization. These are good for basic ad hoc monitoring of a single machine, to see what’s running and what’s consuming the system’s resources.
System monitoring tool requirements
A more sophisticated system monitoring package provides a much broader range of capabilities, such as:
- Monitoring multiple servers. Handling servers from various vendors running various operating systems. Monitoring servers at multiple sites and in cloud environments.
- Monitoring a range of server metrics: availability, CPU usage, memory usage, disk space, response time, and upload/download rates. Monitoring CPU temperature and power supply voltages.
- Monitoring applications. Using deep knowledge of common applications and services to monitor key server processes, including web servers, database servers, and application stacks.
- Automatically alerting you of problems, such as servers or network devices that are overloaded or down, or worrisome trends. Customized alerts that can use multiple methods to contact you – email, SMS text messages, pager, etc.
- Triggering actions in response to alerts, to handle certain classes of problems automatically.
- Collecting historical data about server and device health and behavior.
- Displaying data. Crunching the data and analyzing trends to display illuminating visualizations of the data.
- Reports. Besides displays, generating useful predefined reports that help with tasks like forecasting capacity, optimizing resource usage, and predicting needs for maintenance and upgrades.
- Customizable reporting. A facility to help you create custom reports.
- Easy configurability, using methods like auto-discovery and knowledge of server and application types.
- Unintrusiveness: imposing a low overhead on your production machines and services. Making smart use of agents to offload monitoring where appropriate.
- Scalability: Able to grow with your business, from a small or medium business (SMB) to a large enterprise.
Here's a List of 6 Best System Monitoring Tools
SolarWinds produces a suite of products for comprehensive network monitoring and management. For system monitoring, two are most relevant: a free tool, the Server Health Monitor, and a for-cost tool, the Server and Application Monitor.
The free Server Health Monitor (SHM) will monitor the availability, health, and performance of up to 5 servers – if you have the right type of servers.
- Supported servers are: Dell PowerEdge™, HP ProLiant™, and IBM eServer™ xSeries.
- Supported blade enclosures are: Dell PowerEdge M1000e, and HP BladeSystem c3000 and c7000.
- And supported hypervisors are: VMware vSphere® ESX Hypervisor and ESXi™ Hypervisor.
SHM uses SNMP, WMI, and CIM to poll the standard components in each server, including power supply, fan speed, temperature, CPU, and battery.
Once you’ve installed SHM, configuration is straightforward. For each server you specify the hostname or IP address, and provide credentials for SNMP, WMI, and/or VMware. You can also adjust the polling interval.
The dashboard tab displays the overall health of the monitored servers. You can click on a server to get its particulars. Each sensor on that server is listed, and you can click a sensor to get greater detail.
SHM gives you near-realtime visibility into the health of a small collection of servers. As an entry-level tool, it doesn’t include a mechanism to send you alerts when you are aren’t in front of the screen, or to generate reports on things like historical trends.
MORE INFORMATION ON THE OFFICIAL SOLARWINDS SITE:
WhatsUp Gold is a long-established network monitoring tool from IPSwitch. It’s a feature-rich yet straightforward server and application monitoring tool, available for a 30-day trial to evaluate the paid version.
WhatsUp monitors servers, virtual servers, cloud services, and applications. It also monitors network traffic. Cloud monitoring includes hybrid cloud environments for Azure and AWS.
The free version is a free five-point license for monitoring up to five resources (eg, five servers).
WhatsUp must be installed on Windows. Setup is simple and uses auto-discovery. The user interface provides multiple views including an interactive network map and the ability to drill down to investigate issues. Dashboards are customizable.
It provides many canned reports and report customization too. There are multiple options for notification, including email and SMS. Triggered actions can also be specified for responding automatically to alerts.
WhatsUp’s list view shows the discovered hosts and devices, summarizing their characteristics and status.
The map view is an interactive map for visualizing your network’s components and their statuses. You can drill down to inspect the availability and performance of individual nodes.
The top 10 view shows critical statuses in your network.
The for-cost Application Performance Monitoring add-on adds the ability to monitor common applications and services.
The free edition of WhatsUp Gold is a straightforward and fully featured tool for monitoring and managing a small shop. Graduating to the for-cost version lets you move up to covering large networks.
The SolarWinds Server and Application Monitor (SAM) is part of the for-cost Orion suite of network monitoring and management tools; we looked at components of the Orion suite in our article on the best sFlow traffic analyzers. Where the Server Health Monitor can meet the needs of a small shop, SAM can cover small businesses to large enterprises. SolarWinds offers a 30-day free trial of SAM.
As the name suggests, SAM monitors the health and performance of server hardware and virtual servers from multiple vendors, as well as doing deep monitoring of many hundreds of applications. It can monitor multiple sites and cloud environments like Azure and AWS.
The SolarWinds Orion suite will auto-discover hosts and devices on your network. Then you can start to monitor them.
Once a server is identified and monitoring has been running, look under Node Details to see SAM’s display of the node’s performance and health data.
The server status data is displayed both graphically and in tables.
A second discovery scan is required so SAM can detect the applications running on the nodes previously discovered.
You can configure the application discovery scan to specify which applications SAM should look for. Then you provided the credentials SAM needs to access the information on the various nodes.
One SAM has detected applications and begun its regular scan, the Application Summary will show top-level status for applications running on your servers.
The summary includes application alerts and events, top 10 nodes by CPU load, by physical memory, virtual memory, I/O operations, etc.
SAM, working with the SolarWinds suite of network monitoring and management tools, provides a full range of features for customizable dashboards, analysis, alerting, reporting, etc.
SAM and the SolarWinds suite are enterprise-grade packages, so they are not cheap and call for considerable resources on the server hosting them. Most components tack on an additional charge. But if your network is large or growing, the SolarWinds suite with SAM is worth exploring.
MORE INFORMATION ON THE OFFICIAL SOLARWINDS SITE:
The Paessler PRTG Network Monitor is a “batteries included” solution that monitors your servers and devices, network traffic, and more. PRTG can use NetFlow and sFlow, and we covered it in some detail in our exploration of free NetFlow traffic analyzers.
The PRTG Network Monitor runs on Windows. It monitors mail servers, web servers, database servers, file servers, and virtual servers. PRTG can monitor multiple sites and cloud services. It uses SNMP, WMI, NetFlow, sFlow, ping, ssh, REST APIs, and packet sniffing.
Setting up the tool is a bit complex but a setup wizard and how-to video lead you through the steps. The tool will find many devices and servers via auto-discovery.
In the user interface, a primary view is the device tree showing the devices (including servers) in your network, and the sensors monitoring each.
On the server hardware side, its sensors can monitor CPU load, memory, disk, server room environment, etc. On the applications side, it comes with more than 200 sensor types for common network services, including HTTP, SMTP/POP3 (email), FTP, etc.
You can specify thresholds for alerts, and PRTG can send notifications of detected issues via several methods, including email and SMS. It provides a range of predefined reports and facilities for designing custom reports. Reports can also be scheduled.
The free version is limited to 100 sensors after a 30-day trial which you can download here. Because a sensor is an individual data stream, each server and device will typically require several sensors.
The free version of PRTG Network Monitor provides a well-stocked toolbox for monitoring a small network.
5. ManageEngine OpManager
ManageEngine produces a full network management suite and offers free versions of some of their tools. In our roundup of free bandwidth monitoring software, we previously looked at the free edition of ManageEngine OpManager.
Setting up the OpManager is a multi-step process but not overly complex. Once you provide the subnet and SNMP parameters, OpManager will scan your subnets and discover your devices.
OpManager monitors availability and performance metrics of physical and virtual servers. The Application Performance Monitoring plug-in adds the ability to monitor applications. OpManager uses SNMP, WMI, and CLI via SSH or telnet.
The “Server Top 10” tab displays top utilization and availability for the discovered physical servers.
The “Virtualization Summary” tab displays metrics for your virtual servers.
OpManager contains sophisticated facilities for alerting and reports. You can set alerts based on thresholds; and it has a variety of useful canned reports, ranging from troubleshooting support to capacity planning and billing, as well as facilities for creating custom reports.
The predefined reports include “Network Health Status,” which gives a rollup for all the detected hosts.
The Application Performance Monitoring plug-in is only available free for a 30-day trial. It adds monitoring for application stacks and servers, web servers and services, databases, containers, and public and hybrid cloud environments.
The free version of ManageEngine OpManager provides you with a well-rounded suite of capabilities for monitoring 10 or fewer devices on a small network.
6. Nagios XI and Core
Nagios is an enduring standard in network monitoring. Nagios Core is the open-source free version, and Nagios XI is the commercial for-cost variant with additional features and automated assistance for configuration. Nagios has a reputation for being powerful, reliable, scalable, and extremely customizable – and being complex to configure.
The free version has a learning curve but also an active community. It monitors servers, services and applications, just like the commercial version. It includes reporting by email and SMS, a basic user interface (including the network map), and basic reports.
Nagios Core lacks auto-discovery, and you must learn to set up and maintain complex configuration. On the plus side, it does give you a lot of flexibility to customize and extend the tool. Community-developed addons can perform discovery and help you get started with configuration.
You can use the free 60-day trial to evaluate the for-cost version and, if you elect to go with the free one, save the auto-generated config files from /usr/local/nagios/etc before uninstalling your eval copy. You can then use those files as your starting point for your new install’s configuration.
The commercial version Nagios XI has a richer range of features, including automated support for discovering your devices and hosts, automatically configuring the tool, and commercially-supported addons. It has a much more sophisticated user interface and more advanced reporting that covers trends, capacity planning assistance, etc.
Nagios XI is built to run on Red Hat Linux and CentOS. For Windows, use a VM appliance with Hyper-V or VMware. It includes an auto-discovery tool and a configuration wizard for adding a new device, host, or application).
Once Nagios XI is installed and monitoring, the Operations Screen gives you a high-level view of the current state of the network, and the Operations Center lets you drill down to the items mentioned.
The Host Status page shows a summary of metrics for the monitored hosts. You can drill down to an individual host to see details including performance graphs, capacity planning info, alarms, etc.
The Service Status page summarizes the state of the monitored services.
Nagios is a well-regarded solution for network monitoring. As with other tools that offer a fully-free vs commercial version tradeoff, you must decide whether you have (or will develop) the expertise and time to use the free tool, or whether it would be more cost effective to pay for the automation and support of the commercial version.
Besides the tooling to monitor your systems, you need a protocol in place for solving problems and responding to incidents. Best practices for system monitoring call for forethought and attention to design.
Free tools are tempting, particularly if you are on a tight budget. The free versions of paid software are usually limited in their capacity so that they can only support small networks. Some freeware has worked its way into the toolkits of seasoned network administrators mainly out of familiarity. However, these underfunded tools are usually under-supported and glitch-laden.
Planning is a key stage when buying new monitoring software. You need to look for suites of monitoring tools that cover the whole system stack. Remember that spending on monitoring saves you money in other areas of the IT department and prevents loss of income to the business due to system failure.