SQL is the Structured Query Language. It was created specifically to manage and access data held in relational databases. The first commercial venture to use SQL was Oracle, which still has its SQL*Plus query language as the basis of all its relational database management system (RDMS).
Today, MySQL, SQL Server, SAP, and Sybase all use SQL for database access. In this guide, we will give you the standard version of SQL commands, paying particular attention to the Oracle implementation in SQL*Plus.
EDITOR’S CHOICE: We recommend SolarWinds infrastructure management tools to monitor all of your Microsoft tools, including SQL Server. You need a suite of monitoring utilities that can interact in order to cover all of the facilities that support your relational database.
SQL manages relational databases. A relational database contains tables that are separated to reduce the repetition of data. The process of deciding on the proper groupings of data categories is called “normalization.” In some instances, performance demands may dictate that tables work better if not completely normalized. This process is called “denormalization” and it is a rare occurrence.
Each table needs a column that uniquely identifies each row. This is called the “primary key.” The other tables that can join to that entity must contain a reference to the primary key in each record. That reference is called a “foreign key.” You join tables together by matching the foreign key in one table with the primary key in another. This action is performed with SQL. The Entity-Relationship Diagram is a useful reference when you are deciding on how to structure your SQL statement.
In order to speed up access, it is normal to place an index on each primary key in the database. It is also possible to store frequently-used joins to create virtual tables, which are called “views.”
One more database object is a trigger. This is a piece of SQL that launches whenever a value is added to a column or is changed. A trigger will update values in a column in a different table or other columns in the same table. This happens if a value is added or changed in the column on which the trigger is placed. It can also restrict the values that can be entered into a column.
DQL, DDL, DCL, and DML
SQL includes four command subsets. These are Data Query Language (DQL), Data Description Language (DDL), Data Control Language (DCL), and Data Manipulation Language (DML).
On most projects, the Database Administrator (DBA) is the only team member that has the privileges to use DDL. This command group creates, changes, and drops tables, indexes, triggers, and views.
DML is used to add, change, or remove records from tables in the database. Access to these SQL functions is usually restricted to support staff or senior, responsible members of the user community.
DQL is use to generate reports from the database. This command subset is used by programmers and can also be made available to end users so they can perform their own queries.
The management of user privileges is stored in the same database as the data. However, access to these tables is usually restricted to the DBA. The management of user access is all implemented through DCL.
There is one useful SQL command that doesn’t fall into the classifications of either DQL, DDL, DCL, or DML. This is “describe” which is usually implemented as DESC. This shows the structure of an object in the database. In the case of a table, it will show a list of columns and the datatype and size of each. It will show the SQL that created a view or a trigger and it will show the columns that are involved in a given index.
The theory behind relational databases was defined by E. F. Codd in 1970. Within Codd’s definition lies a requirement of a query tool that must implement eight essential relational operations. These are:
You need to master each of these operations in order to effectively access data from a relational database. These are explained below.
SELECT was originally called RESTRICT. It is the ability to extract one record or a group of matching records from a table in the database. As well as being a theoretical operation, SELECT is the name of the main command in any SQL implementation. In fact, all data retrieval is executed with a SELECT. The command is so powerful that it can be modified in many, many ways. Most of the SQL you will need to learn are the “clauses” on a SELECT statement.
PROJECT is the ability to isolate a column (or “attribute”) in a table. This is always implemented with a SELECT statement.
JOIN links together tables on a common attribute. You use the foreign key in one table to match to primary key values in another table. The act effectively unifies two tables temporarily. This is implemented as a clause in a SELECT statement. A join between two tables will automatically leave out any records that do not appear in both supporting tables. An “outer join” will display records that don’t have a match in the other table, leaving the fields that should have been supplied by the other table left blank.
PRODUCT is also known as “times.” This is implemented as a SELECT statement that draws data from two tables, but doesn’t specify the join between them. In real life, there are very few practical reasons why anyone would ever perform a times on two tables. It is usually a warning sign that you have written a JOIN clause badly. The result of a PRODUCT is that every qualifying record in one table matches with every qualifying record in the other table. So, if you create a SELECT statement from two tables, each of which have ten records, the results will be a total of 100 records.
UNION is a common operator in set theory. If you have two sets, the union combines all of the members of both sets. Duplication is removed from the results. In SQL, this is implemented by the OR Boolean operator.
INTERSECT is the other main operator of set theory. If you have two sets, the intersect describes those subjects that are members of both sets. Any subject that is a member of only one of the two sets is excluded from the results. In SQL, this is implemented by the AND Boolean operator.
DIFFERENCE is also known as “minus” and in notation is represented by the minus sign (“-“). There are several ways to implement a DIFFERENCE. It can be created by including a Boolean NOT in the selection criteria and it can also be implemented by a MINUS operator followed by a subquery.
DIVIDE is an unusual operation that, like PRODUCT, has very few practical applications. Codd regarded DIVIDE as the acid test of any relational system and its query tool, so every SQL implementation includes this capability just to qualify for the title “relational.” In reality, you will never use it. Implementing a DIVIDE require a series of subqueries feeding into a MINUS in Oracle SQL*Plus and an EXCEPT operator in some other SQL implementations, such as SQL Server and PostgreSQL.
George Boole was a British mathematician who died in 1864. He invented Boolean algebra, which had no practical application until the fields of Informatics and relational databases emerged in the late 20th century. Boolean algebra is central to the operations of SQL — you could only write the most basic SQL query without using Boolean logic. Boolean logic also underpins all search engines. The key Boolean operators in SQL are:
Another quirky Victorian Brit that you need to know about is Augustus de Morgan. He came up with an observation on transformation in Boolean set theory, of which every SQL programmer needs to be aware. This is that a NOT can alter how an AND and an OR work. The short version is that you need to be careful how you use brackets in your statements to avoid unintended consequences. So “and” and “or” should also be considered to be important Boolean operators in SQL.
AND implements the INTERSECT in set theory. If you want to pull records out of a database that only match with two requirements and not just one of that composite criteria, then you need to use an AND. For example, you may run a seed business and your database contains a table that stores all of your seeds with their attributes. A customer calls in and wants only yellow begonias. So, you would search your database for species = “begonia” AND color = “yellow” to get the right seed catalog.
OR gives you all of the members of two sets – it is the UNION in set theory. So, in your seed database, if you ran a query: species = “begonia” OR color = “yellow”, you would get results that included yellow roses, yellow begonia, pink begonias, and yellow daffodils. In this example, yellow begonias qualify twice. In relational theory, only one record should be returned. Fortunately, SQL includes the qualifier DISTINCT, so by entering SELECT DISTINCT * FROM flowers WHERE species = “begonia” OR color = “yellow”, that yellow begonia record would appear only once.
NOT is easy to understand. It excludes the results that the following statement would produce: AND NOT, OR NOT. In practice, you will almost never use “OR NOT” in a query. If your seed customer wanted any color of begonia other than yellow, you would use species = “begonia” AND NOT color = “yellow” to select products to offer.
According to De Morgan’s Law:
“the negation of a disjunction is the conjunction of the negations; and the negation of a conjunction is the disjunction of the negations”.
In plain English, use brackets carefully in your SQL statement when also including a NOT to prevent your ANDs from transforming into ORs and vice versa. Bracketing conditions together can also shorten your selection criteria. For example, NOT (species = “begonia” AND color = “yellow”) is the same as NOT species = “begonia” OR NOT color = “yellow”.
Different SQL implementations use different syntax, but in this guide, we will follow Oracle’s SQL*Plus. The basic structure of an SQL data selection is:
SELECT <column> FROM <table> WHERE <selection criteria>
This will give you a list of values from one column in one table. This is the PROJECT relational operation defined by Codd.
You can add columns to the SELECT or use the wildcard operator (“*”) to get all columns in the table:
SELECT * FROM <table> WHERE <selection criteria>
This is Codd’s SELECT relational operation. Without the WHERE clause, you would get every record in the table.
Multiple columns are separated by commas (“,”):
SELECT column1, column2, column7 FROM <table> WHERE <selection criteria>
The WHERE clause selects specific records. This is the most important purpose of Boolean operators:
SELECT seed_num, seed_desc FROM seeds WHERE species = “begonia” AND color = “yellow”
You will notice in these examples that you do not put quotes around column names, but you do put quotes around column values. If you forget to put quotes around values, the DBMS will expect the words to be column names. So in the above query, without the quotes, you would get an error telling you that there are no begonia and yellow columns in the table.
The key work in SQL queries lies in joining tables together. There are many tricks that you can perform with subqueries, but if you get the join wrong, your query will always return incorrect results.
Your best guide for a join is the Entity-Relationship Diagram. All you need in order to create a join is include more than one table in the FROM clause of the SELECT and be sure to specify the linking columns in the WHERE clause. Be aware that some tables might use the same column names. To avoid confusion, it is better to give each table in a join a pseudonym so that you can identify the source of the columns that you refer to in the SELECT statement:
SELECT a.seed_num, a.seed_desc, b.supplier_name FROM seeds a, suppliers b WHERE a.species = “begonia” AND a.color = “yellow” AND b.supplier_ID = a.supplier_ID
In the above example, supplier_ID is the primary key in the suppliers table and a field with the same name is the foreign key in the seeds table that links to the appropriate record in the suppliers table.
Although it is technically possible to use an OR for a join, it is more usual to use an AND. Use ORs only for selection criteria and not for creating a join.
You can use a SELECT statement as a source of data for the main SELECT and you can also join several SELECT statements together with operators, such as MINUS. These instances of extra queries serving a main SELECT statement are called “subqueries.” When creating a subquery, the keyword IN can be very useful.
For example, if you only want to get yellow begonias from suppliers in the EU, you could create a list of values with IN such as:
SELECT a.seed_num, a.seed_desc, b.supplier_name FROM seeds a, suppliers b WHERE a.species = “begonia” AND a.color = “yellow” AND b.supplier_ID = a.supplier_ID AND b.supplier_country IN (“UK”,”France”,”Germany” …)
That list of all countries in the EU would stretch to 28 values and you would have to type in all of them – the “…” in the example is only for brevity. In this example, you would have to revisit all of your queries and alter that list of values whenever a country joined or left the EU. It is more efficient to create a table of EU countries and refer to that. Then you only have to add or delete records in one table as the composition of the EU changes.
In this scenario you would create a table called EU_countries with a country_ID field as the primary key and a country_name column. Then you could create a subquery to create that EU country list:
SELECT a.seed_num, a.seed_desc, b.supplier_name FROM seeds a, suppliers b WHERE a.species = “begonia” AND a.color = “yellow” AND b.supplier_ID = a.supplier_ID AND b.supplier_country IN (SELECT country_name from EU_countries)
It is possible to create a series of nested queries and some SQL functions require at least two SELECT statements. For example, if you wanted to get non-EU seed suppliers you could use a MINUS:
SELECT a.seed_num, a.seed_desc, b.supplier_name FROM seeds a, suppliers b WHERE a.species = “begonia” AND a.color = “yellow” AND b.supplier_ID = a.supplier_ID MINUS SELECT c.seed_num, c.seed_desc, d.supplier_name FROM seeds c, suppliers d, EU_countries e WHERE d.supplier_country_ID = e.country_ID AND b.supplier_ID = a.supplier_ID
In the above example, the second query will bring back far more records than the first. However, none of the results from that query will appear in the output. Any results in the main query that match the results of the second query will be removed from the output list.
You can get SQL to aggregate data for you in the course of processing a query. The GROUP BY clause enables you to calculate statistics on groups of records. For example:
SELECT species flower, count(1) instances FROM seeds GROUP BY species
In this example you can see a function, COUNT. This requires a column as a parameter, but you can just give it a token instead, in this case, 1. This example also shows that you can give column names aliases. There isn’t a coma between species and flower in the first line of the SELECT. The DBMS will interpret this second word to be the column name to display in the output. In some SQL implementations, that alias needs to be in quotes and in others it should be indicated by the keyword AS:
SELECT species AS “flower”, count(1) AS “instances” FROM seeds GROUP BY species
Putting a meaningful name on a column is particularly important if you are using functions in your query.
The above query will give you a distinct list of species names from the seeds table and give a count of all of the records in the table that have each value.
You can specify the output sequence of results with an ORDER BY clause. This will show values in ascending order by default, but you can also specify to show records in descending order.
SELECT DISTINCT species FROM seeds ORDER BY species DESC
If you have several columns in your output, you can nominate any of those to be the ordering field. In most SQL implementations, you don’t need to type in the column name in the ORDER BY clause, but just the column position number instead, such as ORDER BY 2 when you want the second column in the output to dictate the display order.
In this example you see the DISTINCT option on a SELECT, which has the same effect as a GROUP BY. However, if you add aggregating functions to a SELECT, you will need to use the GROUP BY instead of DISTINCT.
You have already seen the COUNT function. There are many other functions available in all implementations of SQL. These are not, strictly speaking SQL, but extra utilities added by the creator of the SQL version. Statistical functions, such as average, sum, max, and min are common to all SQL flavors.
Flow control mechanisms, which as IFs and WHILE loops do not exist in SQL. However, conditional selection can be implemented through a variety of functions.
The Comparitech SQL Cheat Sheet
There is a long list of options that you can put on a SELECT statement. This short guide has only covered the basic structure of the SELECT statement, which is the main tool of SQL’s Data Query Language. You can see a comprehensive list in the Comparitech MySQL Cheat Sheet (PDF). That sheet also includes the standard commands used for SQL’s Data Description Language, Data Control Language, and Data Manipulation Language.