Tuesday, May 31, 2011

Normalizing Repeated Columns: Multiple Columns (Part 2)

Last time, in Normalizing Repeated Columns: Multiple Columns (Part1) I talked about Normalizing data in repeated columns from a spreadsheet, like this:



Into a structure like this:




Using a One-Time process that creates the tables and relationships in addition to copying the data.

Appending to Existing Tables

The process for normalizing repeated column data into Existing Tables is similar to the One-Time process. It does, however, require you to create the tables and relationships correctly ahead of time. Because I've just created them in the previous section, I'll simply delete the data and re-append it from the linked spreadsheet.

The two processes are very similar, but I'll be using Append queries rather than Make-Table queries.

Step 1: Remove the common PC elements.

Since I've already got a table, I'll use an Append query rather than the Make-Table query I used earlier:

INSERT INTO PC ( PC_Num, OS )
SELECT PC_RC.PC_Num, PC_RC.OS
FROM PC_RC;


Or in the Query Builder:


The resulting table looks like this:


Since I already have an autonumber primary key defined on the table, I don't need to do anything further.

Step 2: Create Software_Temp table

This step is exactly the same as Step 2 above. I'll use a Make-Table query to create a temporary table that will be used for creating the relationships later.

INSERT INTO Software_Temp ( PC_ID, Software, SoftwareType )
SELECT PC_Software_Temp.PC_ID, PC_Software_Temp.Software, PC_Software_Temp.SoftwareType
FROM (SELECT PC.PC_ID, Software1 As Software, PC_RC.Software1Type as SoftwareType
FROM PC INNER JOIN PC_RC ON PC.PC_Num = PC_RC.PC_Num
WHERE Software1 IS NOT NULL
UNION
SELECT PC.PC_ID, PC_RC.Software2, Software2Type as SoftwareType
FROM PC INNER JOIN PC_RC ON PC.PC_Num = PC_RC.PC_Num
WHERE Software2 IS NOT NULL
UNION
SELECT PC.PC_ID, PC_RC.Software3, Software3Type as SoftwareType
FROM PC INNER JOIN PC_RC ON PC.PC_Num = PC_RC.PC_Num
WHERE Software3 IS NOT NULL) AS PC_Software_Temp;


The result will be a table that looks like this:


Step 3: Remove Duplicates from Software_Temp

I need to create a query that removes the duplicate values. The DISTINCT predicate works well for that. As before, I'll use an Append write the data to my existing Software table:

INSERT INTO Category ( SoftwareType )
SELECT DISTINCT Software_Temp.SoftwareType
FROM Software_Temp;


With the resulting table:


Again, this table already has an autonumber primary key, so I don't need to create one.

Step 4: Creating the Linking Table: PC_Software

So now, I've got my two "One-Side" tables: PC and Software. Now I just need to fill the linking table.
I can do that with an Append Query and a simple join of Software and Software_Temp:

INSERT INTO Software ( Software, CategoryID )
SELECT DISTINCT Software_Temp.Software, Category.CategoryID
FROM Software_Temp INNER JOIN Category ON Software_Temp.SoftwareType = Category.SoftwareType;


Or in the Query Builder:


The final result will look like this:


It is important to note that the order in which the data is moved is vital. Data must be written into the "one-side" tables (PC and Software) first, and then data can be moved into the "linking" tables (PC_Software).

This two-part post is available as a single download (with sample database) here: http://www.rogersaccesslibrary.com/forum/normalizingrepeatingcolumns1_topic562.html.
































No comments: