Wednesday, September 13, 2017

The Normal Forms: Third Normal Form

Last time, in The Normal Forms: Second Normal Form, I discussed how to remove redundant data by identifying fields which are not functionally dependant on the entire primary key. Figure 1 shows the results.

Figure 1: Order table Decomposed into Orders and Order Details


This corrected some data anomaly errors in my data, however, data anomalies are still possible under 2NF. To prevent these anomalies, I need an additional rule: Third Normal Form (3NF).

Definition

A table is said to be in Third Normal Form (3NF) if:

  1. It is in Second Normal Form and
  2. If all non-key fields are mutually independent, that is, all fields are functionally dependant ONLY on the primary key field(s).

Implications

There are two main sources of data anomalies that 3NF corrects are 1) Redundant Data and 2) Calculated fields.

Redundant Data

Although I removed some of the redundant data when I split the Order table into Orders and OrderDetails, there is still some redundancy left, namely ProductNum and Item. Both of these fields are dependant on the entire primary key, so they comply with 2NF. However The ProductNum and Item fields are mutually dependant, that is, they depend upon each other. The product number determines the item description and the item description determines the product number.

Just as we saw in 2NF, redundancy can lead to inconsistent data being entered into the database or correct information being changed after the fact. Figure 2 shows some data anomalies possible under 2NF as a result of redundant data.

Figure 2: 2NF Data Anomalies Due to Redundant Data


Product A7S has two different items associated with it: either a wrench or a nail. Which is it?

Also, two product numbers (B7G and B7H) are associated with an Item called "saw". Is this the same saw or not?

Calculated Values

Mutual dependency is also an issue with storing calculated values. Suppose I had a Quantity and Price field and I decided to calculate the ExtendedPrice by multiplying the them. This is a common database error made by novices.

The problem is one of dependency. The Extended Price calculation depends on the Quantity and Price fields for its value. 3NF says that no field should depend on any field except those making up the primary key.

If I store that calculation and later go back and change one of the dependant fields (either the Quantity or the Price), my calculation will be incorrect. Figure 3 shows some calculated values anomalies.

Figure 3: Anomalies with Calculated Values


First of all, if the user is manually calculating and typing in the value of the Extended Price, the value could be anything, even a calculation from a different row. So let's assume I have an automated process, a formula in a form which calculates the value.

The problem is that you must depend on programming to maintain your data integrity, not the database itself. If the integrity is maintained at the database level, it cannot be subverted.

In the case of the table above, the first anomalous record was caused by changing the Quantity from 1 to 2 after the fact. But because I didn't have a process to re-calculate the value if Quantity changed, the Extended Price is now wrong.

In the second case, an Update Query was used to raise the price of nails was raised from $0.09 to $0.10. Unfortunately, the query did not include a new calculation, so all Extended Price calculations for nails are now wrong.

Implementation

The problem of calculated values is a simple one to solve. Don't. As a general rule, I just don't store calculations. There are minor exceptions, but in most cases, I'll be safe by just leaving them out. When I need these values, I'll calculate them as output in either a query, form, or report.

As with 2NF, the solution to redundant data is to remove it to a separate table, leaving one field to join back to the original. In this case, the ProductNum, Item, and Price fields will go into the Products table. I'll leave ProductNum in the Order Detail table to maintain the relationship. Figure 4 is the result.

Figure 4: Decomposing Order Details to Remove Redundant Data


So now I've removed as much redundant data as possible. There's still a little left. There always will be in order to maintain the relationships between tables. But none of the redundancy will result in data anomalies, so I can say with confidence that my tables are now normalized to Third Normal Form. Figure 5 shows the final design.

Figure 5: Final Design


In my next and final post: The Normal Forms: In A Nutshell, I'll wrap it all up.

.

No comments: