Monday, March 21, 2011

Querying Repeated Columns: Multiple Unions

In the first post in this series (The Problem of Repeated Columns), I defined Repeated Columns and talked about the data integrity problems associated with them. If you haven't read that yet, it would be worthwhile to read first. Similarly, if you are not familiar with the concept of Normalization, you should read my blog series What is Normalization?

What I'm going to concentrate in the next few post are specific problems with querying data stored in repeated columns. This time, it's Multiple Unions.

Others in this series:

Multiple UNIONs

Last time, in Querying Repeated Columns: Multiple ORs, I talked about the difficulties of querying data in repeated columns. But it gets worse. Suppose I want to categorize each type of software. In which case, I'd have to add a category column for each software column (Software1Type, Software2Type, ...etc). It could look something like this:

clip_image002
Figure 1: Repeated Columns with repeated category columns.

By comparison, the normalized version would look like Figure 3, with both Software and Category removed to their own table:

clip_image004
Figure 2: Fully normalized with Category column added.

Now, suppose I want to query a listing of all word processors for each PC. To create such a query with repeated columns, I'd have to resort to multiple UNIONs:

SELECT PC_ID, Software1 as Software, Software1Type as Category
FROM Worse_PC_RepeatedColumns
WHERE Software1Type="word processor"
UNION
SELECT PC_ID, Software2 as Software, Software2Type as Category
FROM Worse_PC_RepeatedColumns
WHERE Software2Type="word processor"
UNION
SELECT PC_ID, Software3 as Software, Software3Type as Category
FROM Worse_PC_RepeatedColumns
WHERE Software3Type="word processor"

Unfortunately, this must be done in SQL view since union queries cannot be viewed in the Query Builder.

clip_image006
Figure 3: Word processor for each PC with repeated columns.

By contrast, creating the query in the normalized structure looks like this:

SELECT PC.PC_Num, Software, Category
FROM Software INNER JOIN
     (PC INNER JOIN PC_Software ON PC.PC_ID = PC_Software.PCID)
         ON Software.SoftwareID = PC_Software.SoftwareID
WHERE Category="word processor";

This one can be viewed in the Query Builder:

clip_image008
Figure 4: Word processor for each PC with normalized structure.

Again, the results of both queries are identical with the exception that the query with repeated columns (UNION) is not updateable.

clip_image010
Figure 5: Results of two queries.

More columns in the repeated columns table makes the query more complex. I'd have to add another UNION for each column pair. Worse yet, there is a limit to the number of Unions you can use in a query, so if you have too many categories, your query may not just be difficult, but impossible.

By contrast, adding more values to the the normalized tables requires no change to the query at all, and there is virtually no limit on the number of categories.

No comments: