Getting Started with collectd

SIEMs, system performance monitors, and security tools all take log messages and activity statistics from the operating systems and software functioning in your company. However, these services all depend on external units to gather those messages. Some tools provide agent programs that you install on your endpoints, while others expect you to set all of that up yourself. The free collectd program can act as a data collector for any system.

The collectd service was written in C for *nix operating systems. That means Unix-based systems, such as BSD variants, macOS, and Linux. The collectd system is distributed as source code, and you need to know how to compile a C program to use it. You also need to know how to launch the executable in the right way to get the data that you expect sent to the location where you need it.

The collectd system is a daemon, which means that it runs in the background constantly on a computer. It is not a complete monitoring system. It provides a stream of statistics to an interpretation package, and that third-party tool needs to be set up to generate graphs and analysis from the data.

There is no version of collectd for Windows. So, if you are interested in using collectd and you only have Windows, you need to revise your plans.

Plug-ins for collectd

The daemon is flexible, and the exact statistics that the program collects are dictated by plug-ins. These plug-ins also define the format of the data records that the program generates. There are currently 171 plug-ins available. Some plug-ins provide a single statistic, while others gather a group of related metrics. Not all plug-ins define data collection subjects, some of them extend the capabilities of the program by providing interfaces to specific technologies, such as the Python programming language.

A plug-in isn’t a command line option. It has to be included when the collectd C source code is compiled. Fortunately, you don’t need to name them individually, you can just compile the whole bundle together.

Libraries and dependencies for collectd

If you are familiar with C programming, you will know that the language itself is extensible. Programmers can create a set of functions and store them in a library for reuse. These libraries need to be available to the compiler along with the main program and referred to in the header of the program with an include statement.

The collectd system has its own libraries and the majority of these are needed by the plug-ins. Without the libraries, you will find that your collectd program won’t compile. So, you won’t need all of them present for the compiler. To work out which libraries you need to look up which plug-ins require which libraries. The requirement of a plug-in for libraries is what is referred to as dependencies.

Accessing the collectd source code

Collectd is managed by an open-source community of the same name. It has a website at collectd.org, but its code is held on GitHub. The current version of the program is 5.12.0 and version 5.11.0 is also currently available.

All the program files are available in a folder tree on the collectd GitHub page. However, it is easier to get the code from the collectd.org where it is available in a tar file.

Where to get collectd

That quick link provides the universal source code but you might not need to download it because your operating system could have it already. You can read details of these options on the Download page of the collectd.org site, which mainly tells you when you don’t need to download the package. Another option is to get the code directly from the GitHub repository with:

# git clone git://github.com/collectd/collectd.git

There are packages available for:

  • Debian
  • Ubuntu
  • FreeBSD
  • openSUSE
  • SLES
  • Fedora
  • RHEL
  • CentOS

In these cases, you don’t need to deal with the source code.

Install collectd from source code

If you prefer to work with the source code instead of a package, you will need to install the C compiler and linker. On Debian, you can do this with the package called build-essential with the command:

# apt-get install build-essential

You need to install all of the libraries needed for the plug-ins. You can get this list by looking at the README file. You can dictate which plug-ins are included by ensuring all of their necessary libraries are available. Set up those libraries:

# apt-get install librrd2-dev libsensors-dev libsnmp-dev ...

Switch to the directory that you saved the tar file in. If you haven’t downloaded it from the link in the home page, you can do that now from the command line with:

# wget http://collectd.org/files/collectd-5.12.0.tar.bz2

Unpack the compressed file and switch to the directory that is extracted.

# tar jxf collectd-5.12.0.tar.bz2
# cd collectd-5.12.0

Configure the sources:

# ./configure

The system will set up collectd with all the plug-ins for which the necessary libraries were present. After the process is completed, you will see a report that lists all the discovered libraries and which plug-ins were included in the build.

Compile the utility:

# make all install

The executable will be written to /opt/collectd/

Install collectd with a package

Using a package for collectd is a lot easier than building the executable from source code. The process is slightly different depending on your distro.

Install collectd on macOS

To install collectd on a Mac, you need to get through the desktop to the operating system. You do this by opening a Terminal window (click on Applications, select Utilities, and then click on Terminal).

Issue the following command:

sudo port install collectd

You can check on the files that the installation of collectd created with the command:

port contents collectd

Install collectd on Debian and Ubuntu

Simply use the command:

# apt-get install collectd

Install collectd on FreeBSD

You can install collectd on FreeBSD with just three commands – you don’t have to download the source code. Type the following:

# pkg_add -r collectd
# cd /usr/ports/net-mgmt/collectd
# make clean install

Install collectd on openSUSE and SLES

For openSUSE and SLES, you could use the source code compilation method or acquire a package to sort all of that out for you.

Get the package from here.

Run the command

# zypper install collectd

Install collectd on RHEL, Fedora, and CentOS

You can get the collectd package for use on RHEL, Fedora, and CentOS in the EPEL repository. Run the following:

# yum install epel-release
# yum install collectd

Install collectd on Solaris

OpenCW provides a package that will install collectd on Solaris. Run these commands:

pkgadd -d http://get.opencsw.org/now
/opt/csw/bin/pkgutil -i collectd

Working with the collectd configuration file

You can find the configuration file for collectd in /etc/collectd/. The file is called collectd.conf.

Open the configuration file with your favorite editor to see how collectd has been set up by the installation process.

A line is disabled if it has a hash sign (“#”) in front of it – this is the comment symbol for the configuration file. To activate a line, you need to delete that symbol. Do this on the Hostname line. The value for this setting is written by default as “localhost” but you can change that to the actual hostname of the computer that the collectd instance runs on or just leave it as it is.

The default configuration comments out all but the most essential plug-ins. You need to scan through the collectd.conf file for things like the start with LoadPlugin. Remove the comment symbol (“#”) from the front of the lines that relate to the plug-ins that you want to use.

A LoadPlugin line that has two comment symbols in front (“##”) relates to a plug-in that was not included in the build and is not available. Do not uncomment these lines because they will crash the program. If you see that a plug-in you need is not available, go back to the manual build from source code as described above. You will need to research the dependencies of that missing plug-in and ensure that its libraries are present before issuing the make command.

After rebuilding the utility, return to the configuration file to see whether it now has a single comment symbol (“#”) instead of a double one (“#”). If so, you can now uncomment that LoadPlugin line to get the plug-in operational.

Start collectd

There are several ways to start up the collectd daemon. The exact command you require depends on your operating system and whether you built the utility from source code or a package.

To run collectd on macOS, run:

sudo service collectd start

On Ubuntu, Debian, RHEL, or CentOS run:

# service collectd start

If your operating system is Fedora, ArchLinux, openSUSE, RHEL, or CentOS 7 and the system uses systemd to manage services, you would need to run:

# systemctl start collectd.service
# systemctl enable collectd.service

If you compiled the utility from its source code, you can start collectd with:

# /opt/collectd/sbin/collectd

If you used a binary package, the executable is stored in a different folder, so you need to run:

# /usr/sbin/collectd

Integrate collectd with monitoring tools

There is a rudimentary Web interface for collectd that you could install and that will show the metrics from collectd live in graphs. However, it is very clunky and most administrators use the collectd system to feed data into a third-party system monitoring tool. We are going to look at:

  • Splunk
  • Graphite
  • Logstash (Elastic Stack)

Many other monitoring systems can accept input from collectd.

The exact metrics that collectd can be sent to these tools depending on which plug-ins are active. Check your configuration file to see which data collection plug-ins are available, and uncomment those that you want to send data on. Examples are CPU, memory, swap, vmem, and processes.

Integrate collectd with Splunk

Splunk can use collectd to gather operating statistics from Linux. To receive the data, Splunk needs an add-on activated within its settings. This is the Splunk Add-on for Linux.

Next, you need to ensure that your build includes one of the following plug-ins:

  • write_http
  • write_graphite

These plug-ins provide two different routes for sending data to Splunk. The write_http method uses JSON to send data via HTTP, and the write_graphite method sends metrics to Splunk via TCP.

Edit the collectd.conf file to uncomment the LoadPlugin statement for one of these plug-ins.

The format of the write_http configuration is:

LoadPlugin write_http
<Plugin write_http>
  <Node "node-http-1">
    URL "note A"
    Header "note B"
    Format "JSON"
    Metrics true
    StoreRates true
  </Node>
</Plugin>


Note A
: The format for the URL value is “https://Splunk Server IP Address:Port Number/services/collector/raw?channel=Token Value

Note B: The format for the Header value is “Authorization: Splunk Token Value” where the Token Value is the same token given in the URL field.

The configuration for the write_graphite plug-in is:

LoadPlugin write_graphite
<Plugin write_graphite>
  <Node "node-graphite-1">
    Host "note C"
    Port "note C"
    Protocol "tcp"
    EscapeCharacter "_"
    AlwaysAppendDS true
    SeparateInstances false
  </Node>
</Plugin>

Note C: Set up an FTP server on your Splunk host and enter its IP address and port number in the settings for the Splunk Add-on for Linux. Enter the same values here in the plug-in configuration.

With these configurations in place, when you start up the collectd daemon, it will send its collected statistics to your Splunk server.

Integrate Collectd with Graphite

You can host Graphite on your own server or get a hosted version of the system. In either case, you need to point your collectd instance to send data to the Carbon module of Graphite. Set up the write_graphite plug-in in your collectd.conf file.

LoadPlugin write_graphite
<Plugin write_graphite>
  <Node "node-graphite-1">
    Host "note E"
    Port "note E"
    Prefix "note F"
    Protocol "tcp"
    EscapeCharacter "_"
    AlwaysAppendDS true
    SeparateInstances note G
  </Node>
</Plugin>

Note E: Check the settings of your Graphite system to get the host address and the port number.

Note F: You will only need the Prefix line if you are using the HostedGraphite service. You need to find your API key, which is your HostedGraphite account ID. Enter that ID in the form: “YourKey.collectd”.

Note G: The SeparateInstances field is Boolean (true or false). This indicates whether several instances of collectd will be sending values to the same target simultaneously.

Integrate Collectd with Logstash

Logstash is a log server for the Elastic Stack. With this tool, you can prepare data for analysis with Elasticsearch. To get a pipeline operating between collectd and Logstash, you need to make configuration changes in both systems.

The Logstash system has a codec plug-in for receiving data from collectd. To activate this, write these lines in the Logstash configuration file:

input {
      udp {
        port => 25826
        buffer_size => 1452
        codec => collectd { }
      }
    }

These values are the defaults.

In collectd.conf, make sure that the network and interface plug-ins are available. Uncomment them and set their values as:

LoadPlugin interface
<Plugin interface>
  Interface "note H"
  IgnoreSelected false
</Plugin>
LoadPlugin network
<Plugin network>
  Server "note I" "25826"
</Plugin>

Note H: The interface value is the name of the network interface that the collectd system communicates on, for example, “eth0”.

Note I: The first value on the Server line is the Logstash host’s IP address. The second value should be 25826, which is the default port used by collectd for its binary protocol. Statistics go to Logstash via UDP.

The strengths of collectd

The collectd daemon is a background process that can be set up to provide statistics on operating statistics continuously. This information can be useful for security monitoring as well as performance monitoring.

The utility has some notable benefits:

  • Free and open-source The utility is free from commercial influence.
  • Long-established Being in use since 2005 gives the utility real-world testing.
  • Compatibility Can be generated on any computer with a C compiler. SSC Serve provides collectd functions for Windows. It can be run on OpenWrt routers.
  • Extensible 171 plug-ins.
  • Adaptable The source code is available for alteration and users can write their own plug-ins.
  • User community There are plenty of channels available to contact experts who are skilled in the use of collectd.
  • Operating metrics Live operating system activity metrics and host resource availability statistics.
  • Communication options Multiple data transfer methods, plus the option to set up to consolidate the streams of multiple collectd instances through a server.

The weaknesses of collectd

The biggest weakness of collectd is that it was written by and for the Linux user community without any involvement from Windows development experts. This fact is the cause of most of the system’s demerits:

  • Weak security Data transfer systems do not automatically apply encryption, and the user must make extra efforts to secure data movements outside the network.
  • Poor graphics The native Web interface is unappealing and outdated – we didn’t even cover it in this guide.
  • Requires technical expertise Setting up the tool to interface with commercial monitoring software requires programmer-like skills.
  • Increasingly marginalized Many of the newer monitoring tools, such as Datadog and Site24x7, don’t have compatibility with collectd.