An Introduction to Structured Query Language (SQL)

By Pan Pantziarka

Before we start: an appeal. If this SQL tutorial is helpful then let us know. If it's wrong in places or needs some clarifying then let us know. If you think it can be improved then let us know. We need the feedback if this is going to be useful to other people. Address comments to: [email protected]

Introduction

Structured Query Language, commonly abbreviated to SQL and pronounced as sequel, is not a conventional computer programming language in the normal sense of the phrase. With SQL you don't write applications, utilities, batch processes, GUI interfaces or any of the other types of program for which you'd use languages such as Visual Basic, C++, Java etc. Instead SQL is a language used exclusively to create, manipulate and interrogate databases. SQL is about data and results, each SQL statement returns a result, whether that result be a query, an update to a record or the creation of a database table.

The purpose of this tutorial is to give an introduction to the key concepts of SQL, and to provide sample code illustrating the use of the most common and useful of the SQL commands. The tutorial is structured around the building of a simple database of software and network users.

Relational Databases

Before looking at SQL itself it's worth looking more closely at what is meant by a database. Although some people talk about SQL databases, there is in fact no such thing. SQL itself makes no references to the underlying databases which it can access, which means that it is possible to have a SQL engine which can address relational databases, non-relational (flat) databases, even spreadsheets. However, SQL is most often used to address a relational database, which is what some people refer to as a SQL database.

The relational model of a database was first defined by Dr E Codd in 1970, working at the IBM Research Labs at San Jose, and it describes a way of structuring data which is mathematically consistent and abstracted away from the physical implementation of the database.

The key concept of the relational model is that data sits in tables, and that data elements within different tables can be related in some way to provide meaningful information rather than just lists of data. Each table in the database has a number of fields which define the content and a number of instances of that field - think of the fields as columns in a table and the instances as rows. For example if we had a table of LAN users, then it might have fields of first name, surname and userID. The rows in such a table would contain the values which define each instance of user, for example the entry to define user John Smith might be: John, Smith, JSmith.

A single table of LAN users does not provide very much, even though it is more useful than no table at all. However the user table becomes immediately more useful when there are other tables to which it can be linked dynamically. Imagine that there is also a table of software products, with fields of Product, Version, User Licence and Product Code. Again this is useful information, especially for inventory purposes, but its value is magnified when there is a relationship between the user table and the software table. For example a link between these tables could be used to record which users are licensed for which software products. From this it would be possible to find out how many users are licensed to use product X, or which products user Y is using. Add fields of department to the user table, for example, and it's possible to derive the relationship between departmental software requirements and the software that is licensed and so on.

Obviously there is much more to the relational model then this, but as far as SQL is concerned it is the concept of tables, fields and relationships which is important.

Designing A Database

Finally, we're almost ready to start looking at some SQL code. The first place to start is in the creation of our sample database. In every kind of database project, the first step is to design the database on paper rather than jumping straight in with the code. One of the aims of this example database is to record the software that is installed on a network and to link this to users and to their departments. Additionally, building on the inventory side of things, it will also keep track of the PC hardware we have installed.

Based on these requirements four tables are required:

User - to record users
Software - table for the installed software products
UserSoftware - to record the Software used by each user (more on why we need this table later)
PC - table to record the PC details of each user

Designing each table is a fairly straightforward process of deciding what information is required for each element in the table. For the User we obviously need First Name, Last Name and Department, but we have to beware of having two users with the same name, so we need to be able to differentiate between them. One way to do this would be to use some kind of unique UserID, an email address or network login, for example. An alternative would be to either create a unique code for each user, or else to use some pre-existing unique identifier, such as an ID from a payroll application. In this example a field called EmpNo will be used to contain this unique identifier, which we shall also assume is an integer rather than a mix of alphanumerics. Finally, we want to record the type of PC for each user, so we'll add a field called PCType. Our User table therefore looks like this:

FirstName - First Name - a text string
LastName - Last Name - a text string
UserID - Network User ID - a text string
Dept - Department - a text string
EmpNo - Employee Number from Payroll system - an integer
PCType - Unique identifier for type of PC - a text string

The Software table has the following fields:

ProductName - Name of Product - a text string
Vers - Product Version - a text string
ProductID - Unique identifier for product - a text string
Licenses - Number of users licensed - an integer
DateOfPurchase - Purchase data from invoicing system - a date
Cost - PPersecutors of product -a currency code

Our table of software used by each user we'll call UserSoftware. In the case of the User table a code for this was included for the PCType, why not do the same with the Software? Because each user is likely to have many pieces of software, and each piece of software is likely to have many different users. It therefore makes sense to create a separate table to record this many-to-many relationship. For this simple table we need fields only for EmpNo - to uniquely identify the user - and ProductID to uniquely identify the software.

Our final table stores the data for the different PCs, and provides the value to enter into the PCType field of the User table.

PCTypeName - a text string
Processor - CPU Processor - a text string
OpSys - Operating system name - a text string
RAM - Mb of RAM - an integer
Disk - Gb of hard disk - an floating point number
ScreenSize - Screen size in inches - a text string
DateOfPurchase - Purchase date from invoicing system - a date code
Cost - Cost of purchase -a currency code

A Word About Datatypes

Many kinds of data can be stored in a database, from simple text to various types of number to Boolean flags to binary objects and graphics. This is one area where there are differences between different products. Not just in the formats of data that can be stored, but also in how these formats are named. Boolean fields, for example, are called Yes/No fields in Microsoft Access, and BOOLEAN in the later releases of MySQL. This is one of the areas where you have to refer to the documentation of whatever database platform you are using. In this tutorial datatypes (or column types as they are also known), will conform as closely as possible to those which are most commonly supported on the majority of platforms.

Creating a Database

Many database systems have graphical interfaces which allow developers (and users) to create, modify and otherwise interact with the underlying database management system (DBMS). However, for the purposes of this tutorial all interactions with the DBMS will be via SQL commands rather than via menus, wizards or any of the other tools which sit between the developer and the database. This is, after all, a SQL tutorial…It is assumed, therefore, that you have access to a DBMS system that allows the use of SQL commands directly.

SQL commands follow a number of basic rules. SQL keywords are not normally case sensitive, though this in this tutorial all commands (SELECT, UPDATE etc) are upper-cased. Variable and parameter names are displayed here as lower-case. New-line characters are ignored in SQL, so a command may be all on one line or broken up across a number of lines for the sake of clarity. Many DBMS systems expect to have SQL commands terminated with a semi-colon character, a practice that is followed in this tutorial. Finally, although there is a SQL standard, actual implementaions vary by vendor/system, so if in doubt always refer to the documentation that comes with your DBMS.

Creating a database is remarkably straightforward. The SQL command is just:

CREATE DATABASE dbname;

In this example we'll call the database AssetReg, so the command is:

CREATE DATABASE AssetReg;

Once the database is created it, is possible to start implementing the design sketched out previously.

Creating Tables With SQL

Having created the database it's time to use some SQL to create the tables required by the design. Note that all SQL keywords are shown in upper case, variable names in a mixture of upper and lower case.

The SQL statement to create a table has the basic form:

CREATE TABLE name( col1 datatype, col2 datatype, …);

So, to create our User table we enter the following command:

CREATE TABLE User (FirstName TEXT, LastName TEXT, UserID TEXT, Dept TEXT, 
EmpNo INTEGER, PCType TEXT );

The TEXT datatype, supported by many of the most common DBMS, specifies a string of characters of any length. In practice there is often a default string length which varies by product. In some DBMS TEXT is not supported, and instead a specific string length has to be declared. Fixed length strings are often called CHAR(x), VCHAR(x) or VARCHAR(x), where x is the string length. In the case of INTEGER there are often multiple flavours of integer available. Remembering that larger integers require more bytes for data storage, the choice of int size is usually a design decision that ought to be made up front.

The commands to create the other tables are:

CREATE TABLE Software (ProductName VARCHAR(15), Vers VARCHAR(10), ProductID VARCHAR(10), 
Licenses INTEGER, DateOfPurchase DATE, Cost CURRENCY);

CURRENCY is another datatype that varies by vendor/product. Consult your documentation. Where an explicit CURRENCY type is not available, there is often a DECIMAL type which can be used as a replacement.

CREATE TABLE UserSoftware (EmpNo INTEGER, ProductID VARCHAR(10));

CREATE TABLE PC (PCTypeName VARCHAR(20), Processor VARCHAR(20), OpSys VARCHAR(10), 
RAM INTEGER, Disk FLOAT, ScreenSize VARCHAR(10), DateOfPurchase DATE, Cost CURRENCY);

FLOAT contains floating point numbers, a DOUBLE type often also exists for higher precision floating point numbers. DATE also comes in various flavours and formats.

There is more to the CREATE TABLE command, just as there is more to database design then we have allowed for in this example. Principally we have ignored any considerations of indexing our tables or of any kinds of constraints or data validation rules. For example you would normally want to ensure that the EmpNo field in the User is unique, that is, no two users should have the same EmpNo. Additionally, you would normally decide in advance the size of various fields, especially for text strings. SQL can code for all of these design considerations, though the syntax is likely to vary between implementations to a greater extent than the basic CREATE TABLE command.

As an example let's refine our User design so that the first name and surname fields can be no longer than 20 characters each; the userID entry can be no longer than 12 characters and that the UserID and EmpNo fields should be unique (which is what the UNIQUE keyword does for us). If you have already executed the original CREATE TABLE command your database will already contain a table called User, so let's get rid of that using the DROP command:

DROP TABLE User;

And now we'll recreate the User table we'll use throughout the rest of this tutorial:

CREATE TABLE User (FirstName VARCHAR (20), LastName VARCHAR (20), 
UserID VARCHAR(12) UNIQUE, Dept VARCHAR(20), EmpNo INTEGER UNIQUE, PCType VARCHAR(20);

Once a table is created it's structure is not necessarily fixed in stone. In time requirements change and the structure of the database is likely to evolve to match these. SQL can be used to change the structure of a table, so, for example, if we need to add a new field to our User table to tell us if the user has Internet access, then we can execute an SQL ALTER TABLE command as shown below:

ALTER TABLE User ADD COLUMN Internet BOOLEAN;

To delete a column the ADD keyword is replaced with DROP, so to delete the field we have just added the SQL is:

ALTER TABLE User DROP COLUMN Internet;

Additionally you can use the ADD and DROP commands to add or delete tables, constraints and table indexes.

>>Next: Inserting Data

Return to home page