SQL injection prevention tips and examples for web programmers

It’s rare to see a truly static website or app these days. Instead of the hard-coded data and table layouts that used to be so common, the modern web development landscape is dominated by The Feed: a seemingly endless scroll of dynamic content.

Feeds like these are created by pulling data out of a database. This allows you to filter things like tweets or status updates without having to switch pages, and means that much more of your service can be automated than in previous decades. The most common types of databases are based on the very popular Structured Query Language (SQL) which is simply the programming language that the database uses for storing and retrieving data. Each table in the database can be thought of as similar to a spreadsheet, each cell of which contains the data points.

SQL Injection is a special type of attack that targets these types of online apps. While the database itself might be secured from hacking, the weak point in these attacks is the application and its level of access to the database. The attack tricks the application into passing extra SQL commands to the database, usually with the intention of either getting elevated access rights or having the database spit out more information than it is supposed to.

We’ll be covering the three main ways a programmer can harden their code against this kind of attack. We’ll also provide samples of Java, PHP, Python, and Perl code to help you understand how these solutions are deployed. There are also ways that a database or server admin can harden their server in conjunction with the programmer. These are:

Using placeholders in prepared queries
Sanitizing the user’s input
The use of stored procedures

And for the server/database admins:

Least privilege access rights

Placeholders

We’ll first cover the use of placeholders in prepared SQL commands embedded in the program’s source code. The placeholder is simply ‘?’ and it goes in the SQL query in place of a yet-to-be-supplied value. The program is then expected to replace that ‘?’ with a value that is either supplied by the user or by another section of the program.

To illustrate the concept, take the following SQL query example:

SELECT * FROM clients WHERE clientID = 1078

The output of that query will be every column of data in the client’s table relating to the client with the clientID of 1078. Now, suppose that the program asks the user to provide their own clientID. The unsafe version of that same query might look something like this:

SELECT * FROM clients WHERE clientID = $clientID

In this example, the user that is accessing the program is expected to simply enter their ID into the correct field to get their account details. However, if the user breaks the syntax of the statement (for instance by entering “0 OR 1=1” instead of a valid clientID) then the database query will now be sent as:

SELECT * FROM clients WHERE clientID = 0 OR 1=1

Since 1 will always equal 1, the above query will spit out the contents of every row in the client’s table. This includes every clientID, along with whatever information is stored in that table relating to a company’s client base; not something that you want just anyone to have access to. The first way to avoid that type of leak is the use of prepared statements with placeholders for the dynamically generated values.

Prepared Statements

SQL already includes an option for using placeholders in a dynamically generated SQL prepared statement. The placeholder can then be replaced with the proper value just before executing the query. A prepared statement is simply a SQL query that is set up early on for the purpose of being used many times over without having to resubmit the entire query to the database. Instead, the prepared statement is sent once and then the differing values are sent in sequence.

In your program, you prepare your SQL query or statement with a placeholder in the place of the data field that you want the user to provide. For instance, with our sample query above, the program is asking the user for their client ID in order to let them view their account details. Instead of placing a variable directly inside the query, you want to place a ‘?’ there instead. It will look something like this:

SELECT * FROM clients WHERE clientID = ?

The program is then tasked with providing a value to replace the placeholder. This can be previously-defined variable, user input, or even a list of values from another program entirely. Once the value has been substituted, the query can be executed, and the results processed.

Placeholders are brilliant in their simplicity. The database itself will filter out a lot of unexpected or potentially dangerous data from the user based on what the placeholder points to. For instance, if your username field only accepts alphanumeric input, it will simply drop anything extra from the query without processing it at all.

Of course, whenever the value is to be supplied by a user, the program needs to make sure that the user’s input doesn’t contain an injection of extra SQL code. The program will need to cleanse the input first, before executing the query.

Sanitizing user input

Sanitizing the user’s input simply means making sure that the user enters only what you are expecting them to enter. Anything that the user enters that doesn’t match what the program is expecting is automatically thrown out and an error generated, before anything ever gets to the database. This method of only validation is preferred to letting the user input whatever they wish and then trying to build a regular expression to throw out or error out when extra SQL code is detected.

In our example above, let’s assume for the moment that the clientID is a string of characters. The program can limit the user input to a whitelist of viable characters, with a maximum length not to exceed the length of the longest clientID and not allowing any spaces. If the clientID were a number, then limit the user input to just integers of a certain length: no more, no less.

For fields that require a mixture of different character types, like an email address, the program may still need a regular expression to cleanse the user’s input. First, to make sure that the program is given an actual email address, and second, to make sure that an email address is all that is provided.

A regular expression is a defined search pattern based on a set chain of characters. One example of a regular expression specifically for validating email addresses looks like this:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b

It is far better to whitelist user input by only accepting information that is expected and dismissing everything else than it is to blacklist user input by searching through what was provided looking for anything that looks suspicious. With a blacklist, there is always the possibility that the regular expression will miss something that was truly unexpected.

As a brief example, several years ago a hacker found out that he could enter a negative value in a funds transfer from his bank account to a friend’s bank account, using his bank’s online banking app. The result was that he effectively stole money from his friend’s account. Being an ethical person, he not only showed this to his friend for a laugh, but he also let the bank know.

The programmers for that bank’s online app did not predict that a user would inject a negative value into the input field for a transfer of funds from one account to another, so it wasn’t in their blacklist. This is the simplest form of a SQL injection attack that I’ve been able to find on the internet and did not require any knowledge of SQL or programming. It just took a little curiosity and a valid user account with sufficient rights to make a transaction.

Stored procedures

Stored procedures can be very secure when the underlying SQL transactions are static. For instance, when the data being accessed is not dependent on input from the user, from another program, or from a variable set earlier in the program that is pulled from a source other than the local server. If the data being presented is based on an environment variable, like the current date, the geographic location of the user, or a username, then the data is considered static.

However, if the SQL transaction is based on any dynamically generated data, then it should be treated as suspicious until it can be validated. Anything supplied by the user during the runtime of the program is automatically considered dangerous. Likewise, any piece of information generated by another program should be considered suspicious until sanitized or discarded.

When handling dynamically generated content, stored procedures are just as susceptible to injection attacks as any other SQL interaction. They can also benefit from the same tactics used by prepared statements, especially from the use of placeholder and sanitizing the input before it is handed to the database.

As the name implies, the stored procedure is actually created in the database itself whereas a prepared statement is setup in the program just before querying the database. When you have a stored procedure in the database already, the program merely needs to call that procedure supplying the value(s) that it is expecting to receive.

To use the previous sample SQL query, you can create a stored procedure in the database itself using the CREATE PROCEDURE command like this:

CREATE PROCEDURE sp_getClientData (OUT param1 INT)
    BEGIN
     SELECT * FROM clients WHERE clientID = ?;
    END//

Now your database has a stored procedure called sp_getClientData. To make use of it, the program only needs to call it and supply a value to replace the placeholder. Take a look at the code samples at the end of the article for calling a stored procedure in each of the programming languages covered.

Server hardening tip: Least privilege access rights

The main tip for making the database server itself a bit more secure when dealing with any kind of program is to minimize the access rights of all the different “user” accounts in the database. There should only be one DBA or Admin account, and those credentials should never be used in any program or application. In fact, any program that needs access to a database should have different accounts based on the privileges that it needs when it connects.

While creating these accounts, the age old adage “less is more” comes into play. Start with no rights, and then add in only the rights needed to perform the function being called. At the same time, any views or stored procedures need to be created at the same time. Again, less is more when it comes to data security.

If the program only needs to look up information relating to a client, it will need a user account with read only access to the specific tables relating to client information. On the other hand, if it is supposed to be able to change any of the information relating to employees of the company, the user account during that program’s database connection segment should have read and write access to tables relating to the company’s employees.

Code samples

So let’s put all that info into some actual code examples. To begin with, we will work with a MySQL database on the localhost called ‘demo’. For the sake of simplicity, this database only has two tables, clients and profiles. Also note that the operating system for these samples is Debian Linux, but the samples can be used on servers running other operating systems as well.

The code samples all perform the same tasks, but in different programming languages. The languages are, in no particular order, Java, PHP, Python, and Perl.

One final note. In the following samples, the process for our programs flow like this:

Connect to the database
Prepare a statement with one or more placeholders or call a stored procedure
Get the required input from the user
Sanitize the user’s input
Inject the user’s input into the SQL transaction taking the place of the placeholder(s)
Execute the SQL query
Display the results of the SQL query
Disconnect from the database when finished

Java

Getting input from a user is fairly simple in Java:

String clientID = request.getParameter("clientID");

To connect to a database, the program needs to know the address of the server, the name of the database on that server, and the login credentials of an account that has the access rights needed for the SQL command(s) that will be executed during its database connection session:

Connection myConn = DriverManager.getConnection("jdbc:mysql://localhost:3306/demo", "username" ,..."password");

To prepare a parameterized statement, you need the placeholder in the SQL statement, the statement handed to the database, and the user-supplied value provided in place of the placeholder:

String query = "SELECT * FROM clients WHERE clientID = ? ";
Statement myStmt = myConn.createStatement( query );
myStmt.setString( 1, clientID);

After you have the connection made, the statement prepared, and the placeholder replaced with the proper value, the program needs to execute the query:

ResultSet myRs = myStmt.executeQuery( );

Of course, once you run the query, you will want the program to process the results into the desired output:

while (myRs.next()) {
 System.out.println(myRs.getString("clientID") + ", " + myRs.getString("client_name"));
}

That is how Java sets up and uses a prepared statement using a placeholder. To have the program call a stored procedure when needed, the program will need to supply the missing value to replace the placeholder in the SQL command. This example makes use of a stored procedure called “sp_getClientData” and submits the clientID value to replace the placeholder in the stored procedure, which is the same SQL code as in the above examples, just stored on the server itself:

try {
     CallableStatement cs = connection.prepareCall("{call sp_getClientData(?)}");
     cs.setString(1, clientID);
     ResultSet results = cs.executeQuery();        
     // … result set handling
} catch (SQLException se) {            
     // … logging and error handling
}

PHP

PHP code examples are all done using the PDO abstraction layer. While most of the samples provided elsewhere use MySQLi or object-oriented MySQLi, PDO was selected here because it has added benefits in the areas of usability, re-usability and security.

To get user input into PHP, simply use a form in your HTML and either POST or GET that input into your PHP. For sensitive data, use POST and not GET, since users can create their own URLs to imitate GET submissions.

Here is one way PDO for PHP connects to the database:

$host = '127.0.0.1';
$db = 'demo';
$user = 'username';
$pass = 'password';
$charset = 'utf8';

$dsn = "mysql:host=$host;dbname=$db;charset=$charset";
$opt = [
    PDO::ATTR_ERRMODE            => PDO::ERRMODE_EXCEPTION,
    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
    PDO::ATTR_EMULATE_PREPARES => false,
];
$pdo = new PDO($dsn, $user, $pass, $opt);

PDO has two ways to prepare a query with placeholders. The first is with question marks as the placeholders:

$stmt = $pdo->prepare('SELECT * FROM clients WHERE clientID = ?');

The second is with something similar to a variable name for each placeholder:

$stmt = $pdo->prepare('SELECT * FROM clients WHERE clientID = :clientid');

Either way will work the same for our purposes, but the second is better suited to an environment where multiple programmers will be working on the same piece of code.

Executing a query will depend on which of the methods you chose above for the prepared statement. For the first version using question marks, the execute command is this:

$stmt->execute([$clientID]);

For the second version, use this:

$stmt->execute(['clientid' => $clientID]);

Finally, the program will need to process the results of the query:

$client_data = $stmt->fetch();

Python

When Python is used as CGI scripts, it will usually take its input from a HTML form. To process that data, the web page will need a form along these lines to pass the user’s input to the Python script:

<html>
<h1>Client Data Access</h1>
<form name="client_input" action="/cgi-bin/client_data_access.py" method="get">
Client ID: <input type="text" name="clientID"><br>
<input type="submit" value="Submit">
</form>
</html>

The contents of the Python script are then broken down into the following segments:

Connect to the database:

db = MySQLdb.connect("localhost","username","password","demo" )

Prepare a cursor object to step through all of the data being returned by the database:

cursor = db.cursor()

Create the prepared statement with a placeholder in the right spot:

sql = "SELECT * FROM clients WHERE clientID = ?"

This is the part that executes your query and stores the results in individual variables:

try:
 # Execute the SQL command
 cursor.execute(sql)
 # Fetch all the rows in a list of lists.
 results = cursor.fetchall()
 for row in results:
     clientID = row[0]
     client_name = row[1]
     client_address = row[2]
     client_phone = row[3]
     contract_status = row[4]
except:
 print "Error: unable to retrieve data"

Perl

Perl needs to have a DBI installed for each type of database that it will be trying to access. Here we are using dbi:mysqlto access our database. If we were using an Oracle database we would need to use dbi:oracle instead.

The first thing this program will do is get the user’s clientID from the user:

return start_form,
h1("Please enter your ID:"),
p("Client ID", textfield('clientID')),
    p(submit),
    end_form;

Once the program has the clientID, it will then connect to the database:

my $dbh=DBI->connect('dbi:mysql:demo','username','password') ||
die "Error opening database: $DBI::errstr\n";

Once connected to the database, the program will need to prepare a query to look up the client’s info:

$sth=$dbh->prepare("SELECT * FROM clients WHERE clientID = ?;") ||
 die "Prepare failed: $DBI::errstr\n";

To execute the statement:

$sth->execute(clientID) ||
 die "SQL statement could not execute: $DBI::errstr\n";

Last, but not least, the program needs to do something with the output from the database:

my $client=$sth->rows();
unless ($client) {
 print "Sorry, there is no client with that ID\n";
} else {
 clientID = row[0];
 client_name = row[1];
 client_address = row[2];
 client_phone = row[3];
 contract_status = row[4];
}
$sth->finish();
$dbh->disconnect || die "Failed to disconnect\n";

Overall message

The main take-away here is that no single one of these methods will make your app safe from SQL injection attacks. There is no magic bullet that will help you or your database admin prevent SQL injection entirely. There are, however, several things that you can both do to lower the chances of an attack being successful and to mitigate the amount of damage an attack can do.

If you have any other suggestions or code samples that you think our readers could benefit from, leave us a comment and help us to make the internet a safer place for us all.

SQL injection prevention tips for web programmers