Tagged: Database Normalization, Databases, PostgreSQL, Query, Relational Database, SQL, Structured Query Language
- This topic has 0 replies, 1 voice, and was last updated 1 year ago by
Oghenemarho.
- AuthorPosts
- February 28, 2020 at 1:58 pm #86820Spectator@oghenemarho
The core functionality and defining feature of relational databases is the definition and maintenance of relationships between the entities stored in the database. This is based on the relational model for database management. We will not be going in depth into describing the concept of the relational model, which was introduced in 1970 by Edgar Frank Codd. What you need to understand right now is that in databases that are based on the relational model, all data stored in it is logically structured within tables and these tables are used to demonstrate the relationship between data.
Each table in a relational database is referred to as a relation and that relation will have a name, and columns used to represent the attributes of the relation. The rows of this table are called tuples and each tuple contains a list of values, one value per attribute (column) of the table. Each table or relation is meant to represent a type of entity, like flights for example, with the columns representing the attributes of a passenger such as first name, last name, origin, destination, and so on. Following this logic, each row or tuple in the passengers table is meant to store an instance of that type of entity occurring, as shown below:
One thing to note is that each tuple or row in the table has one unique value that is not repeated in any other row in the table. These values are designated the primary key and in our example table above, the id attribute is where the primary key for each tuple is stored.
In order to reduce data redundancy in our databases, a process referred to as normalization is required. This involves organizing columns and tables in our database to ensure that dependencies are correctly enforced to improve the integrity of our data. Let’s take a look at our example table above. If that was the only table in our database, we would run into several problems. First of all, we will have a lot of duplicate data for the origin and destination once more entries are added because there can be several flights originating from any of the locations, which is not tidy. Secondly, if we needed to update any of the information relating to a particular origin or destination values (like correct the name or spelling), this process will be tedious because it will involve updating each and every instance where that value occurs on the table.
A simple way to tackle both of these problems will be create a separate table called locations that will contain all the possible flight origins and destinations, each with their own unique IDs or primary keys. But how do you now establish a relationship between these tables? That will be done through the use of foreign keys. In this context foreign keys are a set of attributes or columns in a table whose values are constrained by their existence in another table. In our example, the origin and destination columns will be replaced with origin id and destination id and the attributes will be constrained in such a way that only the location primary keys from the locations table can be accepted as values in these columns. With this we have established the relationship between the two tables.
So how do we implement this in our database management system. For this case, we will be working with PostgreSQL. First, we will create our locations table using this command:
123CREATE TABLE locations (id SERIAL PRIMARY KEY,location VARCHAR NOT NULL);Let’s also add in some data to the table, using the query below:
1INSERT INTO locations (location) VALUES (‘New York’), (‘Atlanta’), (‘Florida’), (‘Texas’), (‘Chicago’);id location 1 New York 2 Atlanta 3 Florida 4 Texas 5 Chicago Next, we create our flights table. Remember that we will need to specify the relationship of specific columns in this table to the locations table that we created earlier. The SQL query will look like this:
12345678CREATE TABLE flights (id SERIAL PRIMARY KEY,firstname VARCHAR NOT NULLlastname VARCHAR NOT NULLorigin_id INTEGER REFERENCES locations(id),destination_id INTEGER REFERENCES locations(id),time TIME NOT NULL);id firstname lastname origin_id destination_id time From the query, you can see that the origin_id and destination_id attributes of the newly created flights table are constrained to only reference values from the id column of the locations table and since these values are integers in that table, they have also been given the same datatype in the new table. What this means is that if a particular location doesn’t exist in the locations table, then it cannot be referenced and added to this table because its id value doesn’t exist in our database.
Let’s say we want to take things further and create a separate table for passengers’ information. What this will mean is that our flights table will only contain attributes related to the flight (that is flight id, origin id, destination id and time) and then we can use the flight id as a foreign key to represent which particular flight that passenger is taking. In this case, the modified flights table will be created using this query:
123456CREATE TABLE flights (id SERIAL PRIMARY KEY,origin_id INTEGER REFERENCES locations(id),destination_id INTEGER REFERENCES locations(id),time TIME NOT NULL);And we can now insert some data using the query below:
1234INSERT INTO flights (origin_id, destination_id, time)VALUES (1, 5, ‘8.00’),(2, 4, ‘14.00’),(3, 1, ‘22.00’);id origin_id destination_id time 1 1 5 8.00 2 2 4 14.00 3 3 1 22.00 Then we create a table for passengers:
123456CREATE TABLE passengers(id SERIAL PRIMARY KEY,firstname varchar NOT NULL,lastname varchar NOT NULL,flight_id INTEGER REFERENCES flights(id),);And of course, load in some data to demonstrate the relationships between all the tables:
1INSERT INTO passengers (firstname, lastname, flight_id) VALUES (‘James’, ‘Baldwin’, 1), (‘Alex’, ‘Whitman’, 2), (‘Brian’, ‘Healey’, 3);id firstname lastname flight_id 1 James Baldwin 1 2 Alex Whitman 2 3 Brian Healey 3 This table is the culmination of all the relationships we have established in our database. The first tuple with a unique id of 1, shows that a passenger named James Baldwin booked a flight with an id of 1. Referencing our flights table, we see that this particular flight has an origin id of 1, a destination id of 5, and a take-off time of 8.00 am. If we reference these new ids in our locations table, we will see that they represent New York and Chicago. Therefore, the summary is that the first record in our passengers table is for James Baldwin who had booked the 8am flight from New York to Chicago.
This may look a little more complicated than just storing all this information in one table as seen in our first example table but as we said earlier, that particular design can lead to logical inconsistencies in our database and this in turn can affect the accuracy and integrity of our data. Establishing relationships this way makes it easier for our applications to understand the data and work with it.
- AuthorPosts
- You must be logged in to reply to this topic.