Experienced systems administrators will tell you that server performance issues are closely related to applications administration. While that’s true, there are some fundamental issues that relate specifically to server performance that apply across the board, no matter which applications you have running on it.
In this report, you will learn how to simplify your server monitoring duties and how to break down the important tasks to make them manageable. We’re going to look at the following monitoring categories:
- Server availability
- Server utilization
- Physical properties
Once you establish a routine, monitoring tasks become very straight forward. You just need to know where to start.
- 1 Server availability
- 2 Server utilization
- 3 Physical properties
- 4 Monitoring tools
- 5 Implementing server monitoring
No matter whether your server has lots of spare capacity or you are pushing its limits, the only thing that the user community cares about is that it is available. The worst performance issue the server will experience is you taking it offline.
In order to keep the server running at its top performance, you are going to need to take it offline from time to time. You will need to perform system cleaning tasks, such as defragmenting the disk, removing temporary files, and reallocating resources, such as storage space, or VM configurations.
Set your monitoring strategy so that you get advanced warning of capacity limits being reached. These will give you the option of performing remedial action, such as creating more disk space ahead of time rather than at the last minute when users need access to the server.
The operating system will need to be patched from time to time and many software packages will need to be updated. So, plan all maintenance tasks for out-of-hour periods. That doesn’t mean that you need to sit up all night because most standard maintenance tasks can be scheduled to run in the small hours of the morning. Be careful to check that there are no essential business batch jobs scheduled for the times that you hope to bounce the server.
One metric that a monitoring utility can give you about server availability is called Uptime. This will show you how long the server has been available and should tally with your own calculations of the time elapsed since you rebooted it. If it doesn’t, then the server failed at some point. The problem with this metric is that you only get to know that the server went down when it is too late. In truth, you probably would have known about an unexpected outage if it happened during business hours because your phone would have started ringing off the hook. However, investigations into why the server went down unexpectedly will enable you to take preventative measures to stop it from happening again.
Your main daytime tasks for server performance monitoring revolve around watching a shortlist of performance issues. These are:
- Processing capacity and utilization
- Memory capacity and utilization
- Disk capacity and occupied space
- Page faults
- Page swapping
- Network interface (I/O) activity
Initially, there is little you can do about these issues other than sit and watch. If you weren’t responsible for buying the server and you weren’t involved in defining the requirements for it, then the best way to work out whether the equipment is fit for purpose is to record its activities and note whether its limitations actually get reached.
In this respect, your monitoring activities will always feed into systems management issues. If you spot performance problems, you will be expected to do something about them.
CPU, memory, and disk performance
The processor has a finite capacity and if that isn’t enough for all of the services and software that it needs to run simultaneously then performance will take a hit. The same is true for RAM and disk space.
It is better to head off full capacity by setting threshold warning levels where CPU, memory, and/or disk space is within reach of being exhausted. That buys you time to take action to head off performance impairment. Discussion of those actions is out of scope for this guide but briefly, you will need to kill a process that seems to be hanging – waiting for resources, or blocking other processes from running. You can also consider moving some services to other servers if you have them.
Page faults are particularly important if you use cloud-based servers, such as AWS, Google Cloud Platform, or Azure. These virtual servers use a “page” concept, which is a block of memory. Basically, the page is the portion of physical memory that has been allocated to your company’s account, or “virtual server.”
A page fault occurs when memory addressing problems arise. This should never happen, but it does. Generally, as this is a service problem, it isn’t your fault and it isn’t your job to fix it. However, you need to know about page faults because they slow down response times. The server will have its own routines to recover from page faults and the cloud service’s technicians will be onto it.
Despite the fact that memory paging is an automated system that should never go wrong, page faults will occur from time to time. If the number of page faults starts to rise, there is a serious problem that could overload the server’s fault handler. If this happens then the performance of your virtual server will be noticeably impaired and users will start to complain.
Servers acquire disk space if memory is running out. This process is called “page swapping.” The memory manager will store some data temporarily on the disk, recalling it when needed.
If you notice that disk space has reduced and memory is fully occupied, this phenomenon might be due to page swapping. Check on this metric to see if that’s what is going on.
Page swapping in itself is not bad. However, the server takes longer to process data stored in temporary files on the disk than it does to get data directly from memory. This means that the occurrence of page swapping will slow down server response times.
It is a good idea to have page swapping enabled as an emergency measure. However, if swapping starts to be a frequent event, you should increase the RAM available in your hardware.
Much of the issues related to I/O monitoring push into the topic of network monitoring, which is a separate issue. However, looking at activity on your network card could be a server problem if the network interface gets overloaded – which means that not all requests are getting through. Overloading could also be an indication of a malicious attack or, it could mean that the card is impaired in some way or not fit for the purpose and you might need to replace it. If network interface activity goes down to zero, your card is probably broken.
A couple of other factors that you need to watch on your server include physical attributes:
- Fan speed
- Power supply
- Physical access
As the systems administrator, you are responsible for the server and that includes monitoring its physical health.
The issue of physical access might not seem to be a performance monitoring issue. However, if a malicious intruder gets into your server room, then the availability of the server could be threatened. As explained above, a server going offline is the biggest performance issue that you need to prevent. So, monitor and control access to the server room.
Temperature and fan speed
Temperature and fan speed are interrelated issues. It is probable that you won’t be able to turn a dial and speed up the fan when you see that the server’s temperature is rising. However, watching your server’s temperature will give you time to check on any physical problems with the fan. You may need to check the server room temperature. If the fan is drawing in warm air, it won’t help to cool down the server.
Certain applications, such as databases and web servers create a lot of load on a processor and so generate more heat. Consider distributing these applications to different servers to lower the load and the temperature. Also, look into the usage of a rack; using every slot might be blocking the circulation of cool air.
Power supply monitoring is a given – you don’t want the voltage to surge or drop. Your UPS should take care of that problem, but you need to monitor current and voltage coming out of that and into your server to be sure that the UPS is working properly.
You can check on all of the important metrics outlined above with command-line utilities and operating system GUI interfaces. However, repeatedly running commands and checking on process monitoring utilities is very time-consuming.
It is better to buy software that will monitor the server for you. Typically, server monitoring software keeps a constant check on those vital indicators and alerts the system administrator if one of the pre-set thresholds gets breached. This enables you to get on with other tasks. You can assume that everything is okay unless you are notified otherwise.
There are many very good monitoring tools available on the market today. It is very common for server monitoring systems to be combined with other functions. The number one monitoring combination that you will encounter is the server and applications monitor. This is because server performance is very closely tied to the performance and requirements of applications.
Probably the best server monitor that you should look at is the SolarWinds Server & Application Monitor. This tool will only run on Windows Server. However, it can also monitor Linux servers and Cloud-based AWS and Azure servers.
If you have several servers on your site, this monitor will track them all down over the network and enroll them in its monitoring program. All of those servers can be checked through one single dashboard. The monitor keeps track of activity on the processor, the disk, in memory and on network interfaces. It will watch factors such as page swapping and page faults as well.
The monitor has an uptime recorder and a constant live graph of server load. It measures response times and it also forecasts where utilization levels will go. The server monitor includes alert thresholds. Those alerts appear in the dashboard, but you can also have them sent to you by email or SMS, so you don’t have to sit and watch the dashboard constantly.
The Server & Application Monitor oversees racks and UPS systems as well as the server itself. It will monitor the server temperature and fan performance.
As the name explains, the SolarWinds Server & Application Monitor also keeps track of application performance. The tool includes a utility called PerfStack, which shows every stack layer supporting each application, showing where performance impairment is coming from.
The Server & Application monitor is very comprehensive and you can check it out and see what a server monitor can do for you by trying out a 30-day free trial of the tool.
Implementing server monitoring
The easiest way to monitor your servers successfully is to get an automated tool to do the job for you. This strategy works out cheaper than hiring extra staff to perform the task manually.
Server monitoring automation, based around earning thresholds can be adjusted according to your own working practices and solution lead times. These tools can also be used to predict future requirements. That will enable you to buy expansion hardware and make sure that server performance stays sufficient to keep the user community happy.