Monday, March 28, 2011

Querying Repeated Columns: Multiple Joins

In the first post in this series (The Problem of Repeated Columns), I defined repeated columns and talked about the data integrity problems associated with them. If you haven't read that yet, it would be worthwhile to read first. Similarly, if you are not familiar with the concept of Normalization, you should read my blog series What is Normalization?

What I'm going to concentrate in the next few post are specific problems with querying data stored in repeated columns. This time, it's Multiple Joins.

Others in this series:

Multiple Joins

In my last post, Querying Repeated Columns: Multiple Unions, I discussed the problem of repeated columns with multiple categories. That is, in addition to Software1, Software2, ... SoftwareX, the table also has a Type field for each software: Software1Type, Software2Type, ... SoftwareXType. One solution that people come up with to solve this problem is to have a separate, "look-up" table for the software. Something like this:

Figure 1: Incorrect solution to the Multiple Category problem

This solution does have some advantages from a data integrity standpoint. I can use the Software table as the Row Source in a combo box on a form, which will limit data entry errors. I also can't assign an incorrect category to a software, since it is stored in the Software table.

The downside, however, is that I have to create multiple Joins with the Software table, one for each Software column. So just to get my original table back, I have to do something like this:

        Software.Category AS Software1Type,
        Software2 ,
        Software_1.Category AS Software2Type ,
        Software3                            ,
        Software_2.Category AS Software3Type
        (( PC_RepeatedColumns
        Software ON
        PC_RepeatedColumns.Software1 = Software.Software )
        Software AS Software_1 ON
        PC_RepeatedColumns.Software2 = Software_1.Software )
        Software AS Software_2 ON
        PC_RepeatedColumns.Software3 = Software_2.Software;

Or in the Query Builder:

Figure 2: Query with multiple joins to display software types.

Unfortunately, this just gets me back to the original, repeated columns table.

But as I showed last time, it does nothing to solve the underlying problem of querying the repeated columns. I would still need to use multiple UNION queries to return a list of PCs with word processors. However, unlike last time, each query in the Union must be joined to the Software table:

SELECT PC_ID, Software1, Software.Category
FROM PC_RepeatedColumns LEFT JOIN Software
   ON PC_RepeatedColumns.Software1 = Software.Software
WHERE Category="word processor"
SELECT PC_ID, Software2, Software.Category
FROM PC_RepeatedColumns LEFT JOIN Software
   ON PC_RepeatedColumns.Software2 = Software.Software
WHERE Category="word processor"
SELECT PC_ID, Software3, Software.Category
FROM PC_RepeatedColumns LEFT JOIN Software
   ON PC_RepeatedColumns.Software3 = Software.Software
WHERE Category="word processor"

Figure 3: Query with Multiple Unions and Multiple Joins

But as I showed last time, using a truly normalized structure, like this:

Figure 4: Fully normalized with Category column added.

I need only a simple join to query the data:

SELECT PC.PC_Num, Software, Category
(PC INNER JOIN PC_Software ON PC.PC_ID = PC_Software.PCID)
ON Software.SoftwareID = PC_Software.SoftwareID
WHERE Category="word processor";

This one can be viewed in the Query Builder:

Figure 5: Word processor for each PC with normalized structure.

Once again, while the results of the queries are the same, querying repeated columns proves much more complicated and much less flexible than querying a normalized table structure.

No comments: