Tuesday, May 16, 2017

What Is Normalization: Part V

Many-to-Many Relationships

In What Is Normalization, Part IV, I discussed the three basic kinds of relationships: One-to-One, One-to-Many, and Many-to-Many. In this post, I'll concentrate on the many-to-many.

Many-to-Many (M:M)
In real life, M:M relationships are extremely common. Each customer can order many products, and each product can be ordered by many customers. In high school, each teacher has many students, and each student has many teachers. At a conference, each attendee attends many sessions, and each session has many attendees.

Returning to our Employee Database example, suppose we want to store a job description as well as the salary history. Since each employee can hold many positions over the course of his or her career, and each position can be held by many employees, we have a many-to-many relationship. So let's go back and change our original table and see how we can normalize it.

Figure 1: Expanded Employee Table (Non-Normalized)

You'll notice that I've changed the SS# field in favor of an EmployeeID field. SS# is a really bad field for a primary key. You still might want to store SS#, but not for a primary key, so I'm leaving it out of this example.

Like the example in What Is Normalization Part II, the table has redundant data, and so it needs to be normalized. And just like in Part II, I'll remove the redundant Employee information into its own table, like so:

Figure 2: Employee Table

Figure 3: Job History Table

But looking at the Job History Table, we can see that we still have redundant data. Job Description is repeated several times. Unfortunately, we don't have a convenient linking field as we did with the EmployeeID field in the Employee Table. So we'll have to make one up. When you make up a primary key, as opposed to using an already existing field for your primary key, it is called a surrogate key. I discuss the difference between surrogate keys and natural keys in What is a Primary Key?

So, in order to remove the job description information, I'll create a table called Job Description and fill it with just one instance of each value. Then I'll also add a field called JobID, and fill that will it with unique values, thus creating a surrogate primary key. Next, I'll return to the Job History table and replace the job descriptions with the corresponding JobIDs. Like this:

Figure 4: Revised Job History Table

Figure 5: Job Types Table

Okay, so what does this have to do with Many-to-Many relationships? Well, if you look at Employee and Job Types, you can see that there is actually a many-to-many relationship between these tables. Each Employee can have many Job Types over time and each Job Type can apply to many Employees.

Unlike one-to-one and one-to-many, many-to-many relationships cannot be directly represented in a relational database. In order to create one in a database, we have to have an intersection table (also called a "linking table"). In this case, the linking table already exists: Job History.

To represent this M:M, you create a relationship between each of the main tables (Employee and Job Types) to the linking table (Job History), with the "many-side" on the linking table. The final relationship will look like this:

Figure 6: Many-to-Many Relationship Implemented

Notice that each primary key of the main tables is represented as a foreign key in the linking table, and while each of the foreign keys is part of the primary key of the linking table, neither is the entire primary key.

So far, I've approached Normalization from a particular perspective. I've put all the information into a single table, looked for redundant data, and removed the redundancy into a separate table. This approach is called "decomposition". It works very well for simple data models, but becomes unwieldy in complex models.

However, there is another approach that has proven easier for me to work with very complex data models. This approach is called Entity-Relationship Diagramming. In Entity-Relationship Diagramming: Part I, I'll talk about that.


No comments: