Need of Normalization in RDBMS

In any database, data redundancy is a problem if it is not handled perfectly. Normalization in databases is important because they give a proper structure. In this blog, we will see what problems are faced by a developer without normalization. We will also discuss a system Single Responsibility Principle.

Introduction to Single Responsibility Principle

Before understanding the single responsibility principle, we should see the table that is shown below.

In this table, every student record is shown with his batch, course and instructor. While a database management system does not stop arranging information like this table, it is not a proper way to arrange records in a table. We can face so many issues during the update, insert and deletion time of any data in this table. For example, if we want to delete any course with student record will become also deleted. And it will lead to data inconsistency.

To make a proper table for stopping issues a single responsibility was introduced. The Single Responsibility Principle(SRP) explains that in RDBMS every entity should have a single responsibility. And every entity table would be related to another entity table.

For example, in the above table student details, course details and instructor name is mentioned. According to the single responsibility principle, the above table's data should be in three different tables student, course and instructor. So that every entity has a single responsibility like the student table only shows student details and updation, deletion in one table will not affect other details.

Now, we will look issues in the above table in detail.

Updation Anamoly

Updation anomaly occurs in these types of tables when we want to update any data in any record.

For example, let's suppose that Dhoni's course instructor is not available today due to any reason and I updated his instructor from 'Mr. Ganguly' to available instructor 'Mr. Vishwa'.

Looking carefully at both the tables, we changed the first record's course instructor. But we also deleted 'Mr. Ganguly' from the record. So the update process is also converted to deletion, because 'Mr. Ganguly' was not present anywhere else. This is called an 'updation anomaly'. For better use of databases, we should make only one entity in a table.

Deletion anomaly

Look again at this table, after the completion of the 'Digital Marketing' course, we want to delete this course.

But after the deletion of the course 'Digital Marketing' the entire student 'Rohith' record' will also be deleted. It is called a 'deletion anomaly' because one data deletion is converted to entire record deletion.

Insert anomaly

Let's look at insert anomaly, if we want to insert a new course 'IOT', we will need also the student name, student id and course instructor. So if all details of the entire record are not present, we can not insert a new course in this table. This issue is called 'Insert anomaly'.

These types of issues occur in RDBMS when data is arranged like the above table.

What is Normalization?

When data is arranged like this table, many issues occur. And it's led to a lot of data inconsistency. This type of table is called Denormalized.

The process of fixing issues of the denormalized table is called Normalization.

There are various processes to make the table normalize, we will discuss this further.

There are various stages to making a denormalized table into a normalized table. Let's see it in detail.

1NF

It is the first step to make a table normalized. It has two conditions for any table to pass.

Every cell of a table can contain only one value not more than one.
There would be one primary key in a table.

Let's see an example of a car rental company in which owners give their cars to customers as rent.

First, we pass it from the first condition of 1NF and it fails. So we have to make this table's each cell into a single value cell. If we separate each line by breaking records, then it will pass the first condition because then every cell will contain only one value.

And again, we check the second condition of 1NF. Let's consider every column as the primary key.

Cust_Id: It can not be a primary key because, in this column, the value is repeated.

Cust_Name: We know that name can be repeated, so it is also not a primary key.

Car_Num_Plate: In this table less data is available. A car may be rented another day also. So it can not be a primary key.

Car_Name: It is not fit for the primary key.

Date_of_Trans: It is also repeated.

Owner_Id: It may be that an owner gives their own car again and again. And an owner can have multiple cars. So it also can not be a primary key.

Owner_Name: It is not a primary key.

Due to the absence of a primary key, we will make a composite key to make the primary key.

Let's say we have a condition that a car can be rented only once on a given day.

If we make columns Car_Num_Plate and Date_of_Trans as composite keys then it will be a primary key for this table. And we can say that the table has passed the given condition of 1NF.

2NF

This is the second step to make a table normalized.

A table will be in 2NF, if :

It follows the conditions of 1NF.
Every column of a table should be fully functional dependent on the Primary Key of the table.

Let's see this table to make this table in 2NF.

If a table follows the conditions of 1NF, it satisfies the first condition of 2NF also. Let's check the second condition status.

In the previous section of 1NF, we created a composite key 'Car_Num_Plate+Date_of_trans' to uniquely identify each record in this table. If a column's data of this table is dependent on both of the composite keys, then the specified will be fully functional dependent otherwise partially functional dependent.

Let's check the remaining columns of this table that are fully functional dependent or not.

Car_Name: This column only depends upon Car_Num_Plate not on Date_of_trans, so it is a partially functional dependent.

Owner_Id & Owner_Name: Both only depend upon Car_Num_Plate not on Date_of_trans, so it is a partially functional dependent. Because an owner may be the same on many dates of transactions.

Cust_Id & Cust_Name: Both columns fully depend upon both of the columns of the primary key 'Car_Num_Plate not and Date_of_trans, so it is fully functionally dependent. For example, if a customer wants to gift a specific car on a specific date then he will demand a particular car on a particular date.

We have seen that every column of this table does not satisfy the second condition of 2NF because every column is not fully functional and dependent on both of the columns of the primary key.

2NF says that if every column besides the primary key is not fully functional dependent, then make a new table of partially functional dependent columns like this.

Now, a table is totally in 2NF.

3NF

A table will be 3NF if:

It is in 2NF or second normal form.
It has no transitive dependency for non-prime attributes.

For example, see these tables.

It is totally clear that these tables are in second normal form. We have to check only the transitive dependency of non-prime attributes(besides primary key columns).

Two things are called to be in transitive dependency if they are connected indirectly. For example, A boy's father is in a government job. He finds his pocket money from his father and his father finds a salary from the government. So the boy is indirectly connected to the government because if the government will not pay his salary to his father, he can not receive his pocket money from his father.

Let's check the transitive dependency of these tables.

In Table 1, Cust_Name directly depends upon Cust_Id and Cust_Id is directly dependent upon the primary key(Car_Num_Plate and Date_of_trans). So we can say that Cust_Name is indirectly dependent upon the primary key and the table has a transitive dependency.

In Table 2, Owner_Name is directly dependent upon Owner_Id and indirectly dependent upon Car_Plate_Num. So this table has also transitive dependency.

Remove Transitive dependency

To remove the transitive dependency, cut the transitive columns and create a new table like:

BCNF

The fourth step makes normalize a table is BCNF. It stands for Boyce-Codd Normal Form.

BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd.

A table will be in BCNF:

It should be in 3NF.
For every functional dependency, if X depends upon Y then Y should be a part of the super key or Y should be a part of the primary key.

Look at this table.

If we make 'Student_Id and Subject' a primary key then it satisfies the conditions of 3NF. So the first condition of BCNF is passed by this table.

When we see closer at the Subject, Maths-100 is learnt by P101 and P103, then we have to know every professor's subject. So here is a dependency of Subject upon Professor_Id. While Professor_Id is not part of the primary key hence this table does not follow the second rule of BCNF.

To make BCNF, remove the columns which does not satisfy the second condition and create a new table with it.

The Professor_Id column is the primary key of the second table.

Need of Normalization in RDBMS