Database Normalization Tutorial with example

Database Normalization Tutorial with example

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

Database normalization is the process of organizing the fields and tables of a relational database to minimize redudancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.

Normalization is a process of simplifying the relationship between the data in a record it is carried out for the following reasons.

  • To simplifying the maintenance of data through updates, insertion and deletions
  • To allow simple retrieval of data in response to query and requests
  • To avoid restructuring of data when new application requirements arises.
  • To structure the data so that any relationship can be easily represented

 

Process Of Normalization
Before getting to know the normalization techniques in detail, let us define a few building blocks which are used to define normal form.

Determinant : Attribute X can be defined as determinant if it uniquely defines the value Y in a given relationship or entity .To qualify as determinant attribute need NOT be a key attribute .Usually dependency of attribute is represented as X->Y ,which means attribute X decides attribute Y.

Example: In RESULT relation, Marks attribute may decide the grade attribute .This is represented as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key attributes are determinants but not all the determinants are key attributes.

Functional Dependency: Yes functional dependency has definition but let’s not care about that. Let’s try to understand the concept by example. Consider the following relation :

REPORT(Student#,Course#,CourseName,IName,Room#,Marks,Grade)
Where:

  • Student#-Student Number
  • Course#-Course Number
  • CourseName -CourseName
  • IName- Name of the instructor who delivered the course
  • Room#-Room number which is assigned to respective instructor
  • Marks- Scored in Course Course# by student Student #
  • Grade –Obtained by student Student# in course Course #
  • Student#,Course#  together (called composite attribute) defines EXACTLY ONE value of marks .This can be symbolically represented as

Student#Course# Marks
This type of dependency is called functional dependency. In above example Marks is functionally dependent on Student#Course#.
Other Functional dependencies in above examples are:

  • Course# -> CourseName
  • Course#-> IName(Assuming one course is taught by one and only one instructor )
  • IName -> Room# (Assuming each instructor has his /her own and non shared room)
  • Marks ->Grade

Formally we can define functional dependency as: In a given relation R, X and Y are attributes. Attribute Y is functional dependent on attribute X if each value of X determines exactly one value of Y. This is represented as :
X->Y
However X may be composite in nature.

Full functional dependency: In above example Marks is fully functional dependent on student#Course#  and not on the sub set of Student#Course# .This means marks cannot be determined either by student # or Course# alone .It can be determined by using Student# and Course# together. Hence Marks is fully functional dependent on student#course#.

CourseName is not fully functionally dependent on student#course# because one of the subset course# determines the course name and Student# does not having role in deciding Course name .Hence CourseName is not fully functional dependent on student #Course#.

Student#
Marks
Course#
Formal Definition of full functional dependency : In a given relation R ,X and Y are attributes. Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-set of X.However X may be composite in nature.

Partial Dependency: In the above relationship CourseName,IName,Room# are partially dependent on composite attribute Student#Course# because Course# alone can defines the coursename, IName,Room#.

Room#
IName
CourseName
Course#
Student#
Formal Definition of Partial dependency: In a given relation R, X and Y are attributes .Attribute Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However X may be composite in nature.

Transitive Dependency:  In above example , Room# depends on IName and in turn depends on Course# .Here Room# transitively depends on Course#.

IName
Room#
Course#
Similarly Grade depends on Marks,in turn Marks depends on Student#Course#  hence Grade
Fully transitively depends on Student#Course#.

Key attributes : In a given relationship R ,if the attribute X uniquely defines all other attributes ,then the attribute X is a key attribute which is nothing but the candidate key.

Ex: Student#Course# together is a composite key attribute which determines all attributes in relationship REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student# and Course# are key attributes.

 

Forms of Normalization

There are 6 forms of Normalization. The six forms of Normalization are

1.            First Normal Form
2.            Second Normal Form
3.            Third Normal Form
4.            Boyce-Codd Normal Form
5.            Fourth Normal Form
6.            Fifth Normal Form

 

Example of Database Normalization :

Unnormalized

The Denormalized table



 

 

 

 

 

 

 

 

 

First Normal Form:

  1. You can only have one value in a column
  2. You should not create multiple columns for a one to many relationship.

(There are more rules to the first normal form, such as two rows cannot be identical etc, which are automatically enforced by the SQL server. Hence I won’t be listing them here.)

OR

A database table is said to be in 1NF if it contains no repeating fields/columns. The process of converting the UNF table into 1NF is as follows:

  1. Separate the repeating fields into new database tables along with the key from unnormalized database table.
  2. The primary key of new database tables may be a composite key

1NF of above UNF table is as follows:

First

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Second Normal Form:

  1. The table is in First Normal Form
  2. All the non primary key columns in the table should depend on the entire primary key.

“The following explanations make this more specific:

  • If the table has a one-column primary key, the attribute must depend on that key.
  • If the table has a composite primary key, the attribute must depend on the values in all its columns taken as a whole, not on one or some of them.
  • If the attribute also depends on other columns, they must be columns of a candidate key; that is, columns that are unique in every row.”

OR

A database table is said to be in 2NF if it is in 1NF and contains only those fields/columns that are functionally dependent(means the value of field is determined by the value of another field(s)) on the primary key. In 2NF we remove the partial dependencies of any non-key field.

 

The process of converting the database table into 2NF is as follows:

  1. Remove the partial dependencies(A type of functional dependency where a field is only functionally dependent on the part of primary key) of any non-key field.
  2. If field B depends on field A and vice versa. Also for a given value of B, we have only one possible value of A and vice versa, Then we put the field B in to new database table where B will be primary key and also marked as foreign key in parent table.
Second

2NF of above 1NF tables is as follows:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In the Second Normal Form, every column is dependent on the entire primary key in that table and not part of the primary key (as was the case in the first normal form).

Third Normal Form:

  1.  Database is in Second Normal form
  2. There are no transitive dependencies (That is every non primary key is dependent directly on the primary key. )

OR

A database table is said to be in 3NF if it is in 2NF and all non keys fields should be dependent on primary key or We can also said a table to be in 3NF if it is in 2NF and no fields of the table is transitively functionally dependent on the primary key.The process of converting the table into 3NF is as follows:

  1. Remove the transitive dependecies(A type of functional dependency where a field is functionally dependent on the Field that is not the primary key.Hence its value is determined, indirectly by the primary key )
  2. Make separate table for transitive dependent Field.

3NF of above 2NF tables is as follows:

 

Third

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Third Normal form in the real world applications

What is explained above is how to lay out a normalized database with natural keys (keys that make sense to the business) . For most practical applications, once this is done, a surrogate key is introduced which replaces the natural key as the primary key. In which case the database would look like this.

UserIdSSNUserName
1332345432Amy
2666666666Kevin
3919919919Raj

 

 

UserEmployerIdUserIdEmployerId
111
222
331

 

EmployerIdEmployerNameEmployerAddress
1Google1 California drive
2Facebook22nd Street Sanfrancisco

 

UserProductIdUserIdProduct
11M
22A
32B
42C
52D
63D

 

Boyce-Codd Normal Form (BCNF) :

A database table is said to be in BCNF if it is in 3NF and contains each and every determinant as a candidate key.The process of converting the table into BCNF is as follows:

  1. Remove the non trival functional dependency.
  2. Make separate table for the determinants.

BCNF of below table is as follows:

bcnf

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fourth  Normal Form :

A database table is said to be in 4NF if it is in BCNF and primary key has one-to-one relationship to all non keys fields or We can also said a table to be in 4NF if it is in BCNF and contains no multi-valued dependencies.The process of converting the table into 4NF is as follows:

  1. Remove the multivalued dependency.
  2. Make separate table for multivalued Fields.

4NF of below table is as follows:

Four

 

 

 

 

 

 

 

 

 

 

 

 

Fifth  Normal Form :

A database table is said to be in 5NF if it is in 4NF and contains no redundant values or We can also said a table to be in 5NF if it is in 4NF and contains no join dependencies.The process of converting the table into 5NF is as follows:

  1. Remove the join dependency.
  2. Break the database table into smaller and smaller tables to remove all data redundancy.

5NF of below table is as follows:

Fifth

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply