These notes are not intended as a comprehensive guide to the topic. Their purpose is to guide you through the main areas you should learn about, with resources provided for further exploration. The goal is for you to learn enough to complete the associated challenge.
When building web applications, we often use a database to store information
We want to store our application data separately from
There are a number of advantages for separating information from presentation
Further learning
A database is just a collection of one or more tables of data
Database field names do not usually contain spaces. Conventions usually suggest making them all lower-case and using underscores to separate words
| id | column_1 | column_2 |
|---|---|---|
| 1 | row 1 | cell |
| 2 | row 2 | lorem |
| 3 | cell | ipsum |
Further learning
The following notes will be based around designing a database to meet an example scenario
Design a database for a web application that records planets, their moons, and space missions that visit those planets. Moons can be assigned to one or more categories such as Ice, Rocky or Volcanic
This scenario lets us demonstrate one-to-many and many-to-many relationships clearly
We can design our database by following a few simple steps
These notes will then explain how to
Further learning
When designing a database, it is important to first decide what data we want to store
If we were to imagine our example data in tabular form, we might have something like this
| id | planet | moon | mission | categories | date_added |
|---|---|---|---|---|---|
| 1 | Jupiter | Europa | Voyager 1 | Ice, Rocky | 1979-03-05 12:00:00 |
| 2 | Jupiter | Io | Voyager 1 | Volcanic | 1979-03-05 12:05:00 |
| 3 | Saturn | Titan | Cassini | Ice | 2004-07-01 09:30:00 |
| 4 | Mars | Phobos | Mars Express | Rocky | 2004-01-10 08:00:00 |
This flat table is useful for thinking but contains duplicated information and multiple values inside single cells. We will remove those problems with normalisation
This step identifies the possible values for the data
Data types control the way the data is stored in the database to make it as efficient and safe as possible
A few example MySQL data types
CHAR(n) fixed length up to 255 charactersVARCHAR(n) variable length up to 65,535 bytes depending on row sizeTEXT for longer textINT for whole numbersBIGINT for large whole numbersDECIMAL(p,s) for exact decimal valuesDATE for calendar datesDATETIME for timestamp without timezoneTIMESTAMP for server-time tracked timestampExact data types can vary depending on the database used
In our example scenario we might use the following data types
Modern defaults you should adopt for MySQL 8
utf8mb4_0900_ai_ci so that all Unicode characters are supportedFurther learning
To normalise data is to remove the duplication of information so each piece of information is stored only once
Once we have removed repetition of data we have removed the possibility of inconsistencies
This ensures that the integrity of the data is maintained
What this means
To normalise data also means to ungroup items of data so each cell contains only a single item of data
What this means
Further reading
In a fully normalised database there should be no duplication of information
However this can sometimes fragment or over-complicate the data, and unnecessarily separate items of information which then need to be brought together in order to answer queries
The database designer may sometimes prefer to have a database which is not strictly normalised so as to simplify the system or improve query performance
Further reading
In every table, we must identify or create a unique ID known as a primary key
A primary key has three requirements
A primary key must be a unique value which enables us to identify a single row of data and so access all other data related to it
If none of the items of data are unique they cannot act as a primary key
The simplest primary key is an integer that gets incremented for each new row
Sometimes we will want to reference the primary key from one table in another table in order to identify a relationship between the two. In the second table it is known as a foreign key
Further learning
For any scenario where you want to use a database, you need to identify the entities and attributes
Should be representations of real-world objects such as events, persons, places and things
For the space scenario these are good candidates
Each entity should have its own table in the database
It is not an easy science to correctly identify entities. It is dependant on the scenario and often subjective
Once we have identified the entities, we can create a database table for each
Pieces of information which characterise and describe these entities
These become the fields or columns for each entity table in the database
You need to identify the name of the attribute such as name, launch_date, category
You need to identify the attribute type such as text, integer, date or similar
You need to decide whether the attribute is optional. Can it be empty
Ask whether the attribute can uniquely identify the entity
Here are three simple example tables before we add relationships
Planets table
| id | name | discovered_at |
|---|---|---|
| 1 | Jupiter | 1610-01-07 |
| 2 | Saturn | 1610-03-25 |
Missions table
| id | name | launch_date |
|---|---|---|
| 1 | Voyager 1 | 1977-09-05 |
| 2 | Cassini | 1997-10-15 |
Categories table
| id | category |
|---|---|
| 1 | Ice |
| 2 | Rocky |
| 3 | Volcanic |
It is important to identify the relationships between tables
There are three types of relationship
To identify a one-to-many relationship, we use primary key to foreign key matching
We store the primary key of the one side as a foreign key in each of the many items, establishing the relationship between them
Matching keys in this way allows related information to be brought together from different tables when the database is queried
In our example, planets to moons is a one-to-many relationship. For each planet there are multiple moons, but each moon belongs to only one planet
Instead of adding the planet name to each moon row repeatedly, we will store planet details in the planets table
We will add a new field to the moons table called planet_id. Here we will put the unique primary key from the planet table as a foreign key in the moon table
Now the planet details need only be stored once. If the planet name were to change, we would only need to update a single record in the planet table and it would apply for all related moons
Here is our example moons table including the foreign key to tie the one-to-many relationship
Moons table
| id | planet_id | name | discovered_at |
|---|---|---|---|
| 1 | 1 | Europa | 1610-01-08 |
| 2 | 1 | Io | 1610-01-08 |
| 3 | 2 | Titan | 1655-03-25 |
For a many-to-many relationship, on both sides of the relationship there can be multiple entities in each table that are inter-related
For example, one mission may visit several planets and one planet may be visited by several missions
To record this relationship we create a separate database table containing two foreign keys which relate to the primary keys for each of the two relevant entities
Here is our missions to planets join table
Missions to Planets join table
| id | mission_id | planet_id |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
We will also use a many-to-many relationship for moons to categories
Moons to Categories join table
| id | moon_id | category_id |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 3 |
| 4 | 3 | 1 |
Further learning
We have designed our database
We need to know how to create it, populate it and access this data
This is where we need SQL
Structured Query Language SQL is the language used by most databases to perform queries and manipulate data
A few SQL rules to bear in mind
SELECT and UPDATE are written in uppercaseFurther learning
If you need to create a new database yourself, you would use the following command
CREATE DATABASE `space_data`
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_ai_ci;
To see a list of all available databases you would enter
SHOW DATABASES;
To select a database you would enter
USE `space_data`;
To delete a database you would enter
DROP DATABASE `space_data`;
THERE IS NO UNDO
Further learning
We have already planned the structure for our database tables. Now we will turn these into SQL
General pattern
CREATE TABLE `table_name` (
`column_name` column_type column_attributes
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Example schema for the space scenario using modern MySQL features such as InnoDB, utf8mb4, unique constraints and foreign keys
CREATE TABLE `planets` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(100) NOT NULL,
`discovered_at` DATE NULL,
UNIQUE KEY `uniq_planet_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `moons` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`planet_id` INT NOT NULL,
`name` VARCHAR(100) NOT NULL,
`discovered_at` DATE NULL,
UNIQUE KEY `uniq_moon_per_planet` (`planet_id`, `name`),
CONSTRAINT `fk_moons_planet`
FOREIGN KEY (`planet_id`) REFERENCES `planets` (`id`)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `categories` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`category` VARCHAR(100) NOT NULL,
UNIQUE KEY `uniq_category` (`category`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `moons_categories` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`moon_id` INT NOT NULL,
`category_id` INT NOT NULL,
UNIQUE KEY `uniq_moon_category` (`moon_id`, `category_id`),
CONSTRAINT `fk_mc_moon`
FOREIGN KEY (`moon_id`) REFERENCES `moons` (`id`)
ON DELETE CASCADE,
CONSTRAINT `fk_mc_category`
FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `missions` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(255) NOT NULL,
`launch_date` DATE NULL,
UNIQUE KEY `uniq_mission_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `missions_planets` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`mission_id` INT NOT NULL,
`planet_id` INT NOT NULL,
UNIQUE KEY `uniq_mission_planet` (`mission_id`, `planet_id`),
CONSTRAINT `fk_mp_mission`
FOREIGN KEY (`mission_id`) REFERENCES `missions` (`id`)
ON DELETE CASCADE,
CONSTRAINT `fk_mp_planet`
FOREIGN KEY (`planet_id`) REFERENCES `planets` (`id`)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Taking the example of the column called id
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY
Within a database, you can request to see what tables exist
SHOW TABLES;
You can also request to see the structure of a specific table
DESCRIBE `planets`;
It is possible to update the structure of an existing database table using the ALTER TABLE command
To add a column
ALTER TABLE `moons` ADD `diameter_km` INT NULL;
To update the name or attributes of a column
ALTER TABLE `missions` CHANGE `launch_date` `launch_date` DATE NULL;
To remove a column
ALTER TABLE `moons` DROP `diameter_km`;
To empty a table use the TRUNCATE command
TRUNCATE TABLE `missions_planets`;
To delete a table use the DROP command
DROP TABLE `missions_planets`;
THERE IS NO UNDO
Further learning
Our database and tables are now ready to go
We need to know how to populate them with data and then retrieve this data
Data-related tasks broadly fall into one of four categories
SQL allows us to do just that
INSERTSELECTUPDATEDELETETo add data to the tables we have created, we use an INSERT command
INSERT INTO `planets` (`name`, `discovered_at`)
VALUES ('Jupiter', '1610-01-07'),
('Saturn', '1610-03-25'),
('Mars', NULL);
You can also insert multiple rows in a single SQL statement as shown above
To update data in a table we use an UPDATE command
UPDATE `missions`
SET `launch_date` = '1977-09-05'
WHERE `name` = 'Voyager 1';
Note the use of the WHERE clause. Without this all items in the table would be updated
To delete data from a table we use a DELETE command
DELETE FROM `moons`
WHERE `name` = 'Io' AND `planet_id` = 1;
Note the use of the WHERE clause. Without this all items in the table would be deleted
THERE IS NO UNDO
Further learning
INSERT INTO `planets` (`id`, `name`, `discovered_at`) VALUES
(1, 'Jupiter', '1610-01-07'),
(2, 'Saturn', '1610-03-25'),
(3, 'Mars', NULL);
INSERT INTO `moons` (`id`, `planet_id`, `name`, `discovered_at`) VALUES
(1, 1, 'Europa', '1610-01-08'),
(2, 1, 'Io', '1610-01-08'),
(3, 2, 'Titan', '1655-03-25'),
(4, 3, 'Phobos', NULL),
(5, 3, 'Deimos', NULL);
INSERT INTO `categories` (`id`, `category`) VALUES
(1, 'Ice'),
(2, 'Rocky'),
(3, 'Volcanic');
INSERT INTO `moons_categories` (`id`, `moon_id`, `category_id`) VALUES
(1, 1, 1),
(2, 1, 2),
(3, 2, 3),
(4, 3, 1);
INSERT INTO `missions` (`id`, `name`, `launch_date`) VALUES
(1, 'Voyager 1', '1977-09-05'),
(2, 'Cassini', '1997-10-15'),
(3, 'Mars Express', '2003-06-02');
INSERT INTO `missions_planets` (`id`, `mission_id`, `planet_id`) VALUES
(1, 1, 1),
(2, 1, 2),
(3, 2, 2),
(4, 3, 3);
To read data from a table or multiple tables we use a SELECT command
The simplest SELECT command will return everything in a database table
SELECT * FROM `planets`;
Rather than returning everything, it is possible to specify just the field names columns you want
SELECT `name`, `discovered_at` FROM `planets`;
Multiple field names are separated by commas
SELECT `name`, `discovered_at` FROM `planets` ORDER BY `name`;
Rather than returning all results, we can add optional parameters to filter our request using WHERE
SELECT * FROM `planets` WHERE `name` = 'Mars';
SELECT `name` FROM `moons` WHERE `planet_id` = 1;
We can order our results using ORDER BY
SELECT * FROM `moons` ORDER BY `name` ASC;
We can change the order using ASC or DESC
SELECT * FROM `planets` WHERE `id` > 1 ORDER BY `discovered_at` DESC;
We can limit the number of results returned using LIMIT
SELECT * FROM `moons` ORDER BY `name` LIMIT 2;
This will return the first two records in the specified sort order
We can specify an offset to start the LIMIT from
SELECT * FROM `moons` ORDER BY `name` LIMIT 2 OFFSET 2;
This will return two records, starting from record three. This is useful for pagination of results
We can combine conditions using AND and OR
SELECT
`name`
FROM
`moons`
WHERE
`planet_id` = 1
AND
`name` LIKE 'E%';
Further learning
All of the SELECT examples so far have been on only a single table
When we normalised the data we specifically moved content into separate tables
Now we need to know how to recombine it
We need to add clauses to match the primary keys and foreign keys
Moons table
| id | planet_id | name | discovered_at |
|---|---|---|---|
| 1 | 1 | Europa | 1610-01-08 |
| 2 | 1 | Io | 1610-01-08 |
| 3 | 2 | Titan | 1655-03-25 |
Planets table
| id | name | discovered_at |
|---|---|---|
| 1 | Jupiter | 1610-01-07 |
| 2 | Saturn | 1610-03-25 |
SELECT with an explicit join preferred
SELECT
m.id,
m.name AS moon,
p.name AS planet,
m.discovered_at
FROM
`moons` AS m
JOIN
`planets` AS p
ON
m.`planet_id` = p.`id`;
| id | moon | planet | discovered_at |
|---|---|---|---|
| 1 | Europa | Jupiter | 1610-01-08 |
| 2 | Io | Jupiter | 1610-01-08 |
| 3 | Titan | Saturn | 1655-03-25 |
We can then add clauses, as before
SELECT
m.name AS moon,
p.name AS planet
FROM
`moons` AS m
JOIN
`planets` AS p
ON
m.`planet_id` = p.`id`
WHERE
p.`name` = 'Jupiter';
SELECT without explicit joins legacy syntax
You may also see legacy syntax that lists tables separated by commas and uses WHERE to join. This is functionally the same as an inner join but modern code should use explicit JOIN as above for clarity
SELECT
m.name AS moon,
p.name AS planet
FROM
`moons` AS m, `planets` AS p
WHERE
m.`planet_id` = p.`id`;
Further learning
When we query across multiple tables the attributes from each table are flattened into a single result set
There may be times when there are fields that we want to retrieve that have identical column names in more than one table for example id
If we try to do this, it will only return one of the named columns in some clients or it will be ambiguous. To resolve this we use AS in our SQL to alias the field names to something else to make them unique
SELECT
p.`id` AS `planet_id`,
p.`name` AS `planet`,
m.`id` AS `moon_id`,
m.`name` AS `moon`
FROM
`planets` AS p
JOIN
`moons` AS m
ON
p.`id` = m.`planet_id`;
List missions that visited each planet
SELECT
ms.`name` AS mission,
p.`name` AS planet
FROM
`missions` AS ms
JOIN
`missions_planets` AS mp
ON
ms.`id` = mp.`mission_id`
JOIN
`planets` AS p
ON
p.`id` = mp.`planet_id`
ORDER BY
ms.`name`, p.`name`;
List categories for a moon
SELECT
m.`name` AS moon,
c.`category`
FROM
`moons` AS m
JOIN
`moons_categories` AS mc
ON
m.`id` = mc.`moon_id`
JOIN
`categories` AS c
ON
c.`id` = mc.`category_id`
WHERE
m.`name` = 'Europa'
ORDER BY
c.`category`;
Further reading
If you wanted to select a list of all moons for each planet, you could perform the following query
SELECT
p.`name` AS planet,
m.`name` AS moon
FROM
`planets` AS p
JOIN
`moons` AS m
ON
p.`id` = m.`planet_id`
ORDER BY
p.`name`, m.`name`;
This will produce repeated planet names, once per moon
To group the moons together in a single column use GROUP_CONCAT
SELECT
p.`name` AS planet,
GROUP_CONCAT(m.`name` ORDER BY m.`name` SEPARATOR ', ') AS `moon_list`
FROM
`planets` AS p
JOIN
`moons` AS m
ON
p.`id` = m.`planet_id`
GROUP BY
p.`id`, p.`name`
ORDER BY
p.`name`;
Further learning
You can select a random entry from a MySQL table using the RAND() function
Add ORDER BY RAND() LIMIT 1 to the end of a query
SELECT * FROM `planets` ORDER BY RAND() LIMIT 1;
This is acceptable for small tables. On large tables it can be slow because a random value is computed for each row
Further learning
We have been storing dates in MySQL using the DATE and DATETIME data types
When these are retrieved from MySQL
SELECT `discovered_at` FROM `planets`;
The output is in the format YYYY-MM-DD or YYYY-MM-DD hh:mm:ss
If you want to format this you can use DATE_FORMAT
SELECT
`name`,
DATE_FORMAT(`discovered_at`, '%e/%c/%y') AS `discovered_short`
FROM
`planets`
WHERE
`discovered_at` IS NOT NULL;
Further learning
Indexes help MySQL find rows quickly
Add indexes on columns used for joins, filters or sorting, and on foreign keys. Avoid over-indexing because each extra index slows down inserts and updates
Examples already included in the schema
uniq_planet_name enforces unique planet namesuniq_moon_per_planet prevents duplicate moon names per planetFurther learning
When running SQL from application code, always use parameterised queries rather than building SQL with string concatenation. This avoids SQL injection and quoting mistakes
Further learning
Try to write the SQL queries that would satisfy the following requests
Stretch tasks
GROUP_CONCATFurther learning