What I'm going to concentrate in the next few posts are specific problems with querying data stored in repeated columns. The first of these is Multiple ORs.
Others in this series:
- Multiple ORs (this post)
- Multiple Unions
- Multiple Joins
- Multiple IIFs
- Impossible Joins
- Summing
- Counting
- Averaging
One of the most common examples of repeated columns in a table come from PC Inventory databases. This is likely because they are created by Help Desk personnel who are familiar with spreadsheet software like Excel. Since repeated columns are allowed, even necessary, in a spreadsheet, it is logical to build a database table the same way. Logical, perhaps, but wrong.
An extremely simplified example of such a table might look like this:
Figure 1: PC_Inventory table with repeated columns |
Figure 2: PC_Inventory normalized |
Note: Proper normalization requires the fields PC_Num and OperatingSystem to be removed to their own table. Technically, Software should also be removed to its own table, but I'll address that later.
So suppose I want to produce a simple list of those PCs which have "Word 2010" installed. With the repeated columns table, my SQL statement would look like this:
SELECT PC_ID
FROM PC_RepeatedColumns
WHERE Software1="Word 2010" OR
Software2="Word 2010" OR
Software3="Word 2010" OR
Software4="Word 2010" OR
Software5="Word 2010";
In the Query Builder, it would look like this:
Figure 3: Query with repeated columns |
By contrast, querying the normalized tables would look like this:
SELECT PC_Num
FROM PC INNER JOIN PCInventory ON PC.PC_ID = PCInventory.PC_ID
WHERE PCInventory.Software ="Word 2010";
In the Query Builder:
The results of both queries are identical:
Figure 5: Query Results |
Adding more columns just makes matters worse. With 20 columns in your repeated columns table, you have to have 20 "OR" conditions in your Where clause. What's more, if you find you have to add another column down the road, you have to modify your query (and every query that uses that table). On the other hand, adding a new value to the normalized table requires no change to the SQL statement.
Thus, with the repeated columns table, I have to do a whole lot more work than with the normalized structure, both initially and when a new software category is added. But it gets worse. Next time I'll look at the same PC_Inventory tables with a software type field added for each software.
No comments:
Post a Comment