An Introduction to SQL - Part Three

Joining Data

What if we want to find out how much disk space the users in the development department have? Here we have data from two different tables to look and it is here that the power of the relational model becomes apparent. At this point we need to create a relationship between the User and the PC table, and this relationship is called a JOIN in SQL/relational database terminology. There are a number of different types of JOIN available, depending on the what it is you need to achieve.

The most common form of join is called an INNER JOIN, and is used to match records between tables where there are matching values in a field common to both tables. For example we have the PCType field in the User and the PCTypeName field in the PC table, and so we would want to perform an INNER JOIN on these fields. The syntax to carry this out is fairly simple:

FROM table1 INNER JOIN table2 ON table1.field1 operator table2.field2;

The operator is usually a simple '=', but other operators can be used, such as '<>', '>', '<' and so on. For the purposes of our example we want to join the tables where the User.PCType field is equal to the PC.PCTypeName field in order to extract the PC.Disk value. Our query would look as follows:

SELECT User.LastName, User.FirstName, User.PCType, PC.Disk
WHERE User.Dept="Development";

What if we want to link users to software to produce a list of software packages for each user? The UserSoftware table lists the EmpNo and the ProductID code, the first of these links to the User and the second to the Software table, thus requiring two joins:

SELECT DISTINCT User.FirstName, User.LastName, Software.ProductName
FROM (UserSoftware INNER JOIN User ON UserSoftware.EmpNo = User.EmpNo) 
INNER JOIN Software ON UserSoftware.ProductID = Software.ProductID;

This generates the following result:

FirstName LastName ProductName
Jim Jones MySQL
Mary Jones MySQL
Mary Jones OpenOffice
Jim Jones VB6
Bharat Patel VB6
Terry Jones VB6
Terry Jones PaintShop Pro
Mary Jones MS Word

To show the PCType in the result we would also need to link in the PC table, which we can join on the field of User.PCType:

SELECT DISTINCT User.FirstName, User.LastName, Software.ProductName, PC.PCTypeName
FROM ((UserSoftware INNER JOIN User ON UserSoftware.EmpNo = User.EmpNo) 
INNER JOIN Software ON UserSoftware.ProductID = Software.ProductID) 

As you can see, the useful information available from the linked tables is far higher than it is from a simple listing of the tables themselves. By linking the different tables together we can gather together the information which is inherent in the tables but is not otherwise easily accessible.

Although the INNER JOIN is the type of join most often used, it isn't the only one. There are in fact 'outer' joins as well as an inner join. A good illustration of a useful outer join follows on from our previous examples. The table of users and software only lists those users for whom software products are recorded, but what if we want to list all of the users in our database? Clearly we would need to join the User to UserSoftware, but we would want all of the records from User listed, not just those for which there is a matching employee number. We do this using a type of outer join called a LEFT JOIN.

We can thus amend our SQL as follows:

SELECT DISTINCT User.FirstName, User.LastName, Software.ProductName
FROM (User LEFT JOIN UserSoftware ON User.EmpNo = UserSoftware.EmpNo) 
LEFT JOIN Software ON UserSoftware.ProductID = Software.ProductID;

This generates the following table of results:

FirstName LastName ProductName
Jim Jones VB6
Jim Jones MySQL
John Smith NULL
Mary Jones MS Word
Mary Jones OpenOffice
Mary Jones MySQL
Chloe Feltham NULL
Bharat Patel VB6
Terry Jones VB6
Terry Jones PaintShop Pro

Note that some systems report a missing value as NULL, others (including MS Access) would simply return a blank in the above result.

And what about those software products for whom there are no users? Again we need an outer join, only this time we want to include those records from the second table in the query, which is called a RIGHT JOIN.

SELECT DISTINCT User.FirstName, User.LastName, Software.ProductID
FROM (UserSoftware LEFT JOIN User ON UserSoftware.EmpNo = User.EmpNo) 
RIGHT JOIN Software ON UserSoftware.ProductID = Software.ProductID;


The SQL we have looked at so far has operated on single records, selecting them based on various criteria and linking them between tables. The next step is to look at how we can aggregate records so that we can calculate sums, averages, counts and so on.

One of the simplest operations to perform is a simple count of records in a table. To do this we need only add to the simplest of SELECT queries. First we create an alias to store the result, and then we use the COUNT command to count a given field - in our case EmpNo:

SELECT DISTINCT Count(User.EmpNo) AS Total
FROM User;

What if we want something a bit more complex, say a count of employees in each department. Here we want to group like records together based on a given field, in our case Dept. SQL provides the GROUP BY clause precisely for this task. The query then looks as follows:

SELECT Count(User.EmpNo) AS Total, User.Dept
GROUP BY User.Dept;

Which generates the following table:

Total Dept
2 Development
2 Finance
3 Marketing

Taking an average is not much more difficult. If we want to find the average Mb of RAM for the different processor types listed in the PC table we could issue the following SQL statement:

SELECT Avg(PC.RAM) AS Memory, PC.Processor
GROUP BY PC.Processor;

Yielding the result:

Memory Processor
74.66 Pentium 2
64 Pentium Pro

Other aggregate functions include Min, Max, Sum etc, depending on your particular implementation of SQL.

Any WHERE cause will have been applied before the GROUP BY clause takes effect, so it is easily possible to exclude certain records from the aggregate functions. This means, however, that it is not possible to use the WHERE clause to exclude some of the aggregated results. For example if you were interested in looking at the average disk size of machines for those machines not running Windows 95, we would use the WHERE clause to exclude OpSys='Win95', and then the GROUP BY clause to group by OpSys:

SELECT Avg(PC.Disk), PC.OpSys AS Disk
WHERE PC.OpSys<>'Win95'

If we want to apply a selection process on the outcome of the GROUP BY process we have to resort to another type of selection clause. SQL uses the HAVING clause to make selections of data after the GROUP BY process has completed. For example, if we wanted a listing of the average memory per processor type but were only interested if the average were less than or equal to 64Mb, we could code a query as follows:

GROUP BY PC.Processor
HAVING Avg(PC.RAM) <=64;

The HAVING clause can have multiple terms, just as the WHERE clause can, and these can be linked with the normal logical operators AND, OR etc.


Having looked at the database creation process using SQL and then followed up with some practical examples using the core SQL commands, particularly the intricacies of the SELECT statement, we are now reasonably well-equipped to understand and utilise the language as required. There are certain areas which have not been looked at, specifically the use of sub-queries, connecting to other databases and certain performance and design issues to do with keys and indices, but they are more specialised and beyond the scope of this tutorial.

<<Previous: Inserting Data

Return to home page

Contents copyright of Pan Pantziarka. If you like it link it, don't lift it. No copying for commercial use allowed. Site © 1999 - 2005.