Tuesday, January 13, 2009

What Is Normalization: Part V

Many-to-Many Relationships

In What Is Normalization, Part IV, I discussed the three basic kinds of relationships: One-to-One, One-to-Many, and Many-to-Many. In this post, I'll concentrate on the many-to-many.

Many-to-Many (M:M)
In real life, M:M relationships are extremely common. Each customer can order many products, and each product can be ordered by many customers. In high school, each teacher has many students, and each student has many teachers. At a conference, each attendee attends many sessions, and each session has many attendees.

Returning to our Employee Database example, suppose we want to store a job description as well as the salary history. Since each employee can hold many positions over the course of his or her career, and each position can be held by many employees, we have a many-to-many relationship. So let's go back and change our original table and see how we can normalize it.

Figure 1: Expanded Employee Table (Non-Normalized)

You'll notice that I've changed the SS# field in favor of an EmployeeID field. SS# is a really bad field for a primary key. You still might want to store SS#, but not for a primary key, so I'm leaving it out of this example.

Like the example in What Is Normalization Part II, the table has redundant data, and so it needs to be normalized. And just like in Part II, I'll remove the redundant Employee information into its own table, like so:

Figure 2: Employee Table

Figure 3: Job History Table


But looking at the Job History Table, we can see that we still have redundant data. Job Description is repeated several times. Unfortunately, we don't have a convenient linking field as we did with the EmployeeID field in the Employee Table. So we'll have to make one up. When you make up a primary key, as opposed to using an already existing field for your primary key, it is called a surrogate key. I discuss the difference between surrogate keys and natural keys in What is a Primary Key?

So, in order to remove the job description information, I'll create a table called Job Description and fill it with just one instance of each value. Then I'll also add a field called JobID, and fill that will it with unique values, thus creating a surrogate primary key. Next, I'll return to the Job History table and replace the job descriptions with the corresponding JobIDs. Like this:

Figure 4: Revised Job History Table

Figure 5: Job Types Table


Okay, so what does this have to do with Many-to-Many relationships? Well, if you look at Employee and Job Types, you can see that there is actually a many-to-many relationship between these tables. Each Employee can have many Job Types over time and each Job Type can apply to many Employees.

Unlike one-to-one and one-to-many, many-to-many relationships cannot be directly represented in a relational database. In order to create one in a database, we have to have an intersection table (also called a "linking table"). In this case, the linking table already exists: Job History.

To represent this M:M, you create a relationship between each of the main tables (Employee and Job Types) to the linking table (Job History), with the "many-side" on the linking table. The final relationship will look like this:

Figure 6: Many-to-Many Relationship Implemented


Notice that each primary key of the main tables is represented as a foreign key in the linking table, and while each of the foreign keys is part of the primary key of the linking table, neither is the entire primary key.

So far, I've approached Normalization from a particular perspective. I've put all the information into a single table, looked for redundant data, and removed the redundancy into a separate table. This approach is called "decomposition". It works very well for simple data models, but becomes unwieldy in complex models.

However, there is another approach that has proven easier for me to work with very complex data models. This approach is called Entity-Relationship Diagramming. In Entity-Relationship Diagramming: Part I, I'll talk about that.

.

4 comments:

Patrick said...

Very good serie ! I have been teaching Access for a long time and your presentation of Normalization is really clear.
And the fact of chaining this with E-R is a great approach.

Fawad Anwar said...

Wow!
I never knew someone who could 'talk Access' like that.

kino said...

This is a good example of normalization. I am curious about the several forms of Normalization. Most experts only follow the first three. What is the purpose of the last four?

Roger's Access Blog said...

@ kino
The short answer is...

The first three normal forms: 1) are common to nearly all databases, 2) solve the vast majority of data anomoly problems, and 3) are the easiest to explain to newbies.

The higher normal forms are used for increasingly specialized cases, and depending on the business rules, are often better handled at the application level rather than the database level.

There's a longer answer, but I'll save that for a new post. Stay tuned.