Monday, December 29, 2008

What Is Normalization: Part III

Putting It Back Together

In What Is Normalization, Part I, I defined normalization and the negative consequences of not normalizing your database. In Part II , I talked about how to normalize your tables to remove as much redundant data as possible. Now, it isn't much good to split your data into separate tables without the ability to put it all back together. For that, we need to talk about relationships.

Relationships store the information about how normalized data can be put back together. They are the heart and soul of a relational database like Microsoft Access. If you don't have relationships in your database, you don't really have a relational database.

Continuing our example from Part II, we have two tables: Employee and Salary History.

Figure 1: Employee

Figure 2: Salary History


So how do we create a relationship between them? In a relational database, tables are related on common data elements -- specifically the primary key of one table and the foreign key in the other. I've defined primary key elsewhere, but foreign key is new. A foreign key is simply a field in another table that has the common data elements needed to create a relationship.

In our example, SS# is the primary key of Employee. In the Salary History table, SS# is the foreign key to the Employee table. If you've normalized your database properly, your related table should have a foreign key. Figure 3 shows how the data in our two tables will be related.

Figure 3: The tables related

So how do we create this relationship? In Access, the easiest way is to use the Relationship Window. In Access 2003 and before, it can be found on the Database Toolbar or on the menu choosing Tools > Relationships. In Access 2007, go to the Database Tools tab on the Ribbon and choose Relationships.

To create your relationship, add both tables to the Relationship window. I usually RIGHT-click on the empty relationship window and choose Add Table. Your window should look like Figure 4.

Figure 4: Relationship window with tables added.


Next, Click on the primary key (SS#) in Employee and drag it over to the corresponding field in Salary History (also SS#). When you finish, the Edit Relationships dialog box will pop up. Figure 5 shows the dialog box.

Figure 5: Edit Relationships dialog box


It will already have the fields filled in. In order to actually make this a relationship, however, you MUST click the Enforce Referential Integrity check box. If this box is not checked, it it NOT a relationship.

If you cannot click the Enforce Referential Integrity box, then there is something wrong with your relationship. More than likely, it will say Indeterminate as a Relationship Type. Usually, this means you have attempted to create a relationship without one of the fields being a primary key. Don't simply uncheck the box and continue. Go back and fix the fields.

Click Create to finish the relationship.

Figure 6: Completed Relationship


You can tell that Referential Intergity has been enabled because the relationship line has a 1 on one end and an infinity symbol on the other. This indicates it is a One-to-Many relationship. That is, each record in the Employee table is related to one or more records in the Salary History table. The 1 will be next to the table where the link field is the primary key. If you do not have those symbols on your relationship, you do not have a relationship.

Here, I showed the One-to-Many relationship, which is the most common type in a relational database. But there are two others: One-to-One and Many-to-Many. I'll talk about those next in What is Normalization: Part IV.

.

Wednesday, December 24, 2008

What Is Normalization, Part II

Break it up. Break it up.

In What Is Normalization, Part I, I discussed various ways to represent data in a tabular form. All of those methods have problems -- from not storing enough data, to storing redundant data, to allowing data anomalies.

The Relational Database Model was created by Edgar F. Codd specifically to solve those problems. The foundation of the Relational Model is a process Codd dubbed "Normalization". It is the process of grouping related data elements into separate tables and then relating those tables on common data elements.

Since an example is worth a thousand cliches, let's look again at the example from Part I.

Figure 1: Salary History
Figure 1 shows the salary history of a small company. As is, it represents the lowest level of data organization a table must have in a database. Codd called this level the First Normal Form (1NF). (Don't worry, I'm not going to explain all the normal forms.)

Another rule of Normalization says that all of the fields in a table should be about just one "thing". But a look at Salary History reveals that some of the fields are not really about the salary history at all.

How do I know? Well, we can't talk about normalization without discussing primary keys. (See What is a Primary Key for a more detailed discussion.) The way that we can tell if all of the fields are about the same "thing" is by seeing if all the fields in the record depend on the entire primary key.

In Figure 1, we can see that the fields necessary to uniquely identify the record are SS# and SalaryDate (the date on which the salary increase was given.) But do FirstName and StartDate depend on the ENTIRE primary key? No, they do not. FirstName or StartDate do not change if the SalaryDate changes.

Therefore we need to move them to a different table. What table? Well, what are the fields about? In this case, they are about the employee, so we will create an employee table composed of FullName, SS#, and StartDate. SS# is the most obvious choice for a primary key in this example. (Note: Social Security number is NOT usually the best choice for a primary key in real life.)

Do FullName and StartDate depend on SS#? Yes they do. The SS# represents a single, real-life person and that person can have only one name and one start date.

Now we are left with three fields in Salary History: SS#, SalaryDate, and Salary. SS# and SalaryDate remain the primary key. Does Salary depend the entire primary key? Again, yes. Each person can have only one Salary increase on a particular date.

So normalizing the table structure requires two tables: Employee and Salary History, like so:

Figure 2: Employee


Figure 3: Salary History


Of course, now we have to somehow put the data back together. For that, we need Relationships. We'll look at that next time in What is Normalization, Part III.


.

Thursday, December 18, 2008

What Is Normalization, Part I

Why Normalization?

What is normalization? Normalization is a methodology for removing redundant data from a database WITHOUT losing information.

There are many ways to represent data. Some of the most common are: spreadsheets, flat files, and relational databases. Each of these ways have their own advantages and disadvantages.

For instance, in a spreadsheet, it's acceptable to represent the data like this:
Figure 1
This is perfectly readable to a human. It is obvious that the row beneath Gina Fawn also belongs to her. Unfortunately, this is impossible for a database to interpret. The database must have all the information it needs on a single row. It cannot refer to another row to get the information it needs.

One way to correct this, would be to fill in the missing information.

Figure 2

Unfortunately, this requires storing a lot of redundant data. What's the big deal? It's only a couple of fields, right? But that's only in the example shown. What if we were storing all of the demographic data (name, address, phone, city, state, etc.) for a lot of people? This would waste a lot of storage capacity.


But wasted storage is not the worst problem. What if the SSN of Gina Fawn's first record was changed to 215-87-7854? Perhaps this was through operator error or maybe a programmatic update. It doesn't matter, the data has been changed. Now, which SSN is really Gina's? The database has no way of knowing. Worst still, the SSN matches Tony Jones. So, does that SSN represent Gina or Tony? Again, no way to know.

This same problem holds true for all the fields which hold redundant data. This is called a Data Anomaly error. Once you start having data anomalies, you cannot trust the integrity of your database.

One way to solve this problem is with Repeated Columns. This is a common solution in spreadsheets. With repeated columns, the non-redundant information (like row two in Figure 1) are stored as columns. We get something like this:

Figure 3

Now we don't have problems with redundancy, but we have additional problems. First of all, we have to decide how many repeated columns to create. In Figure 3, I only show one salary increase for Gina and Tony, but is that reasonable? What if Gina has five wage increases and Tony had seven? Is seven sets of columns enough? Do I cap it at the largest record? Or do I add more columns to accommodate growth? If so, how many?

Secondly, such a table structure requires a lot of manual modification and becomes untenable when you have a lot of data. Perhaps instead of just the date and salary, we are also storing the job description, pay grade, status, and so forth? The structure would be come so large and unruly that it would be impossible to maintain.

Fortunately, Normalization comes to the rescue. In What Is Normalization, Part II, I'll explain how. Thank Codd.

.

Monday, December 15, 2008

New Sample: Form_DynamicSizingOfSubform

Author: A.D. Tejpal

This sample db demonstrates run time manipulation of subform control height. It covers both views for the subform, i.e. continuous form as well as datasheet. The views can be toggled from one to the other by clicking a command button.

Depending upon the layout for a specific form, it is preferable to specify the maximum number of rows upto which the subform height should expand. This figure can be entered in text box named TxtMaxLines. Default value for max lines is set at 6. While stepping through the records on main form, if the records in linked subform happen to exceed MaxLines, further increase in the height of subform is kept in abeyance. At this stage, vertical scroll bar is inserted. As and when the subform records are again within limits, the vertical scroll bar is removed and the height of subform control adjusts suitably.

In case of datasheet view, care has been taken to dynamically adjust the column widths so that no awkward blank space is visible at right in lieu of scroll bar when it is not needed.

Two alternative ways are available for navigating through records on the main form. The user can either use the normal navigation buttons or the unbound combo box for jumping to any desired record. The combo box is suitably synchronized so that even while using normal navigation buttons, the value displayed in combo box matches the current record.

Version - Access 2000 file format

You can find this sample here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=441

New Sample: Form_DynamicSizingOfSubform

Thursday, December 11, 2008

DH Query By Form

DH Query By Form

Author: Duane Hookom

The DH QBF is a complete query by form applet that can be easily integrated into any existing Access application. Typically, the functionality provided by DH QBF can replace many "canned" reports. The developer imports several forms, tables, a query, and a report from the DH_QBF.mdb, creates some master queries, and deploys.

The developer creates one or more master queries that join tables, alias field names, create calculated columns, etc. The users can then select a master query (datasource) from a drop-down and then select up to 30 fields from the master query. Users can define sorting and criteria as well as grouping and totaling. All of this "design" information is stored in two tables for re-use.

The results of the queries are displayed in a datasheet subform contained in a main form. The main form has options to send/export the records to print, Word table, Word merge, Excel, HTML, CSV, Merge to Report, or a graph. Most formats allow he user to automatically open the target application. The Word merge process will open a new Word document and link to the merge fields.

***NEW FEATURE***
An all Access mail merge feature was added Mar 6, 2004. This allows users to create simple mailmerge output without using a word processing program such as MS Word.

You can find this sample here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=12

DH Query By Form

Monday, December 8, 2008

What is a Primary Key?

If I have a small company, any individual employee can be distinguished by a combination of First Name, Middle Initial, and Last Name. This combination uniquely identifies each employee.

If I have a larger company, the chances increase that I could have two employees that have the same name. So the name can no longer be used to uniquely identify a record. So instead, I could use a number like Social Security Number.

A primary key is a special kind of index (see What is an Index?) that is composed of a field (SSN) or combination of fields (First/Middle/Lastname), which uniquely identify a record.
A primary key has a number of useful properties.

First, the value in the primary key cannot be duplicated. If it is a single field, that value cannot be repeated. If it is composed of multiple fields, that combination of values cannot be duplicated. So in the samples above, having a primary key would mean that I could not put two "Roger J Carlson"s in my database.

Secondly, the primary key cannot be NULL and no portion of the primary key (in the case of a multiple field key) can be NULL. (See What does NULL mean?). The NULL means the value of the field is unknown. Obviously, if we don't know the value of the fields, we can't guarantee the value is unique. So disallowing NULLS guarantees we have a valid value in the key fields.

Thirdly, in order to create relationships, there must be a unique index on the field. Since by definition, a primary key is a unique index, a primary key makes an ideal join field. I will discuss this in a later post.

There are two basic types of primary keys: Natural Keys and Surrogate Keys.

Natural keys

A natural key is one composed of a field or fields that already exist in the table. In my examples above, both Social Security Number and Firstname/Middle/Lastname are natural keys. Natural keys can be composed of a single field or multiple fields.

It is important to note that a table can only have a *single* primary key. It is incorrect to say that a table has multiple primary keys. In the case of a multi-field primary key, it has a single primary key composed of multiple fields.

Surrogate keys

A surrogate key is an artificially created number. It has no real-world meaning, and is used mostly in relationships with other tables. In Access, you use the Autonumber datatype to create a surrogate key. This number is system-created and is guaranteed to be unique.

The disadvantage of a surrogate key is that it does not have real-world uniqueness. It would be possible to enter two records for Roger J Carlson, each with a different system-created number. To protect against that, you should also create a unique index on those fields that would otherwise create a natural key.

The advantage of a surrogate key is that it will never be affected by real-world changes to the database. It is also much more efficient to join tables on a single number than on multiple text fields.

Opinions differ, but I prefer a surrogate primary key with a separate unique index. This separates the functions of the primary key: the surrogate key for relationships and the unique index to control real-world uniqueness. In this way, if the conditions that effect the real-world uniqueness (as in the case of moving from a small business to a large business mentioned above), the table relationships will not be disturbed.

Creating a primary key in the Access User Interface is easy. Just open the table in Design View. For a surrogate or single field primary key, select a single record and click the Primary Key button.


For a multi-field primary key, just hold the Control [Crtl] key as you select the fields and then click the primary key button.

Thursday, December 4, 2008

New Sample: Report_HideGrpHdrsIfNoDetail

Report_HideGrpHdrsIfNoDetail

Author: A. D. Tejpal

This sample db demonstrates run time hiding of group headers and footers in access reports, if there are no longer any printable records for the group as a result of conditional cancellation of detail section's format event.

This functionality is achieved by using a set of two classes. The main class (C_CondHeaders_Main) determines the maximum existing number of group levels and report sections, apart from building up a list of unique group keys. This class also builds up a collection of objects conforming to sub-class (C_CondHeaders_Sub).

Each instance of the subclass represents a group header or footer section. Instead of trying to pass report section object as an argument, simply the section index is supplied, thereby circumventing the hurdle otherwise faced. Corresponding report section object (WithEvents) is generated within the sub-class.

List of unique group keys generated in ShowDetal method of main class gets pushed to the sub-class via the latter's ShowGroups property. Action for hiding the group header / footer as applicable, is carried out in the sub-class.

With the combination of two classes as outlined above, the arrangement becomes completely generic, capable of handling any number of grouping levels. At the same time, the code becomes drastically shorter and compact.

IMPORTANT:

  1. The report must have a calculated text box having [Pages] as part of its expression.
  2. Each group header section should have a text box bound to that level's ControlSource field
  3. If there are any sorting/grouping levels above the last header which do NOT have headers, then for each such level, the report must include a textbox (may be hidden) located in Detail section, bound to the field serving as group's ControlSource.
Version - Access 2000 file format

You can find it here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=440

Wednesday, December 3, 2008

Access 101: What is an Index?

An index is a method of cataloging the records in a table to increase the speed and efficiency of retrieving them. You can think of it as a library card catalog.

If the stacks in a real-world library represent your table, the library card catalog is the index. The library card catalog stores just the information necessary to identify a particular book: author, title, and genre, plus a number that identifies where the book can be found in the stacks.
With this number, you can go directly to the location of your book. Without it, you would have to start at the beginning of the stacks and look at each book until you found the one you wanted. (This assumes that the books are not stored in any particular order.)

An index works the same way. It stores the value of a field or combination of field and the location of each record that match it. With this information, a query that sorts or searches on the indexed field(s) can go directly to the records. Without it, it has to start at the beginning of a table and look at each record until it finds the ones it wants.

There are a number of different kinds of indexes: simple indexes, multi-field indexes, unique indexes, and clustered indexes.

Simple indexes are indexes on a single field. They may or may not allow duplicate values. Their main use is for searching and sorting records in a table.

Multi-field indexes are indexes across multiple fields. These also may or may not allow duplicate values. This is different than having simple indexes on multiple fields. In a multi-field index, the combination of records is indexed.

Unique indexes do not allow duplicate values. They may be either simple or multi-field. If it is multi-field, it will allow duplicates in individual fields within the indexed fields, but it will not allow a duplicate across the all of the fields in the index. This is useful to make sure you don't have duplicate records in your table.

Clustered indexes control how the records are stored in the database. If you have a clustered index, the records will actually be stored in that order. A table can have only one clustered index. This makes sense since records in a table can only be physically stored in one order.

Primary Keys are a special type of index. In Access, the primary key is both a unique index and a clustered index. It has one additional property in that no field in the index can be NULL. (see What does NULL mean? ). There can be only one primary key in a table. For more on primary keys, their creation and uses see: What is a Primary Key?

How do you create an index?

For simple indexes, it's easy. Just go to the Design View of the table and select a field. In the Properties, you'll see an Indexed property. Select one of the "Yes" options. To make it a unique index by selecting Yes (No Duplicates).

For multi-field indexes, it's a little more complicated. The simplest thing to do is create a simple index on the first field in your index.


Then on the line directly below it, add another field WITHOUT giving it an Index Name. If the Index Name field is blank, Access will assume the field is part of the index directly above it.


Add as many fields to the index as you want. From the Indexes Window, you can also set the Unique, Primary, and Allow Nulls properties.

A word of warning about indexes. While they do speed up the retrieval of records, they also slow down the insertion of records. Every time a new record is added, all the indexes need to be rebuilt. If you have too many indexes and a lot of records, this can be a problem. Therefore, it is best to just index those fields that really need indexing. These would include fields that you will be sorting or searching on or fields that participate in Relationships or Joins.