What are format string attacks? (+ how to prevent them)

Many programming languages use what is called format strings to insert values into a string of text. But unless they are set up properly on the server-side, format strings can be exploited to execute arbitrary code, perform buffer overflow attacks, and extract sensitive information from the web server/application.

In this article, we’re going to look at what format string attacks are, how they work, and what you can do to protect against them.

Format string history

In September 1999, security researcher Tymm Twillman discovered that format string vulnerabilities could be exploited as an attack vector. He was performing a security audit of the open-source FTP server ProFTPD, written in the C programming language. What he found was a printf() function that passed user-generated data to the web server/application without a proper format string.

After extensive testing of printf-type functions, Twillman demonstrated that a format string attack could be used for privilege escalation. A privilege escalation attack is one in which the attacker, logged-in as a low-level user, manages to escalate their privileges to a higher-level user or even gain root access, which would give the attacker complete control over the system.

However, format string vulnerabilities are becoming less prevalent because most modern compilers today will flag the issue.

What is a format string attack?

Most format string attacks exploit the C programming language. But other programming languages, such as Python, can also be vulnerable to format string attacks if its format() functions are not properly configured. We’ll be focusing on C in this article.

One of the most basic functions in the C programming language is printf(), which means “print formatted”. The printf() function is used to send data to standard output (stdout). That data can be an ASCII text string, but it can also use format specifiers as variables and pass the variables’ values as parameters to the printf() function.

Here’s an example:

char* dir_name = “Sensitive_Info”;

int no_of_files = 99;

printf(“Directory %s contains d% files”, dir_name, no_of_files);

In the above example, we have two format specifiers: %s and %d. %s takes the next argument and prints it as a string. %d takes the next argument and prints it as an integer. So %s will be bound to the dir_name variable, and %d will be bound to the no_of_files variable. And our output will be:

Directory Sensitive_Info contains 99 files

Of course, %s and %d are not the only format specifiers available in C. Different format specifiers are used for various data types. Some examples are %f for floating-point values or %u for unsigned decimal.

We also have the %n format specifier, which, rather than reading from the specified variable, writes to it instead. The %n format specifier stores the number of characters that come before encountering %n and writes it to memory.

Format functions are powerful tools, and programmers use them extensively to perform automatic type conversions, saving them a lot of time in the process. But, printf() format strings can be vulnerable to a variety of attacks if they’re not configured properly.

And we should also bear in mind that printf() is but one format function out of many. We also have fprintf(), which prints to a file, sprintf(), which prints to a string, snprintf(), which prints to a string with length checking, and many more. And they’re all vulnerable to format string attacks.

Let’s look at how the attack could work.

The format of format string attacks

Format string functions in C can be used without any format specifiers. And that’s where the vulnerability lies. Say we have some C code that contains the following:

char* user_input = “FooBar";

printf(user_input);

If the printf() function above was controlled by the server, which would hard-code the user_input variable into the format string function, then it would be perfectly safe. However, barring that, a malicious actor could exploit the format string to mount an attack. Because no format specifier is present, an attacker could compile the program and run it while passing a format specifier to the program as an argument rather than as a normal string.

./vulnerableCcode “%x %x %x %x %x\n”

Every time printf() encounters a format specifier, it expects to find a suitable variable in its argument list for each format specifier it encounters in the format string. In C programs, variables are saved in the stack. When printf() sees the first %x specifier, it simply refers to the stack and reads the first variable it finds after the format string.

This behavior will be repeated for all five %x specifiers in our example. And the result will be that printf() prints the hex representation of five values from its stack. These values could be variable values, function return addresses, function parameters, user input data, or pointer memory addresses, among other data points.

What kind of damage can a format string attack cause?

If your web server/application is vulnerable to format string attacks, a malicious actor playing with carefully crafted format strings could use them to:

Crash the program (denial of service)
View data on the stack
View memory at arbitrary locations
Execute arbitrary code
Write data into arbitrary locations

Format string denial of service attacks typically use multiple instances of the %s format specifier (string) to read data from the stack until the program tries to read data from an illegal address and crashes.

Format string reading attacks typically use the %x format specifier (hexadecimal values) or the %p (pointer) format specifiers to print values stored in memory or in the stack that are not meant to be public.

Format string writing attacks tend to use the %d (signed integer), %u (unsigned integer), %x (hexadecimal) format specifiers, along with the %n format specifiers, to force the execution of attacker-supplied shellcode.

Format string vulnerabilities

In 2021, a security researcher in Denmark found that trying to connect to a wi-fi network named “%p%s%s%s%s%n” caused his iPhone to lose wi-fi capability. Another researcher – Alex Skalozub — showed that the string “%s%s%s” was enough to cause the same result.

As reported in The Register, it actually appeared to be the third “%s” that was terminating the wi-fi connection. This was due to it instructing the software to use a – likely non-existent — referenced string. Needless to say, Apple’s software shouldn’t obey user-provided format strings in this way.

In 2023, format string vulnerabilities in some high-end Asus routers were identified by the Taiwanese CERT. These could be exploited remotely and had the potential to allow attackers to hijack devices. The three vulnerabilities related to the lack of proper verification of input format strings. Patches have since been released for the three flaws.

Format string vulnerabilities were also identified in F5’s BIG-IP interface. Researchers from Rapid7 found that by inserting format string specifiers into certain GET parameters in the SOAP interface, attackers could “cause the service to read and write memory addresses that are referenced from the stack.”

Examples of format string attacks

Say we have the following C code:

#include <stdio.h>

void main(int argc, char **argv)

{

printf(argv[1]);

}

The above code is vulnerable to format string attacks because of the line:

printf(argv[1]);

The safe version of the code would be:

printf("%s\n", argv[1]);

which includes a format specifier.

If we were to compile and run the program with the safe printf() line, like so:

./vulnerableCcode "FooBar %s%s%s%s%s%s”

The output would simply be:

"FooBar %s%s%s%s%s%s”

The program would not interpret the trailing

"%s%s%s%s%s%s"

and wouldn’t open the door to a format string attack.

If, on the other hand, we were to compile the program with the unsafe printf() line and run it with our malicious string, the program would interpret each %s as a string pointer and would try to read them from the stack or from memory. At one point, the program would encounter an invalid address and crash. This exemplifies using a format string vulnerability to pull off a denial of service attack.

Another example would be running the compiled program with the unsafe printf() line with the following malicious string:

./vulnerableCcode "FooBar %p %p %p %p %p %p %p %p”

This would return values from the web server/application’s stack—something like:

FooBar 0xffffdddd 0x64 0xf7ec1289 0xffffdbdf 0xffffdbde (nil) 0xffffdcc4 0xffffdc64

An attacker could use the above method to exceed the stack space allocated to a variable. This is known as a stack-smashing attack and, based on the application under attack and its execution environment, could lead to a root compromise of the OS itself.

Preventing format string attacks

Preventing format string attacks means preventing format string vulnerabilities, which implies keeping certain things in mind while coding your C application.

If possible, make the format string a constant.
If the above isn’t possible, then always specify a format string as part of the program rather than as an input. You can fix most format string vulnerabilities by simply specifying %s as the format string.
Use FormatGuard. FormatGuard is a small patch to glibc that provides general protection against format bugs. glibc is the standard C libraries package for Linux.

Conclusion

So that was an overview of format string vulnerabilities and attacks. The attacks can be nasty, but thankfully, they’re not that hard to prevent. It just takes a bit of due diligence, and you should be fine.

It also shows us that no bug is too small to be exploited. The internet is a hostile place, and one should never assume that such low-level bugs will never be exploited. They will be. It’s just a matter of time.

Code safely.

What are format string attacks and how can you prevent them?