Tuesday, March 28, 2017

What Is Normalization, Part II

Break it up. Break it up.

In What Is Normalization, Part I, I discussed various ways to represent data in a tabular form. All of those methods have problems -- from not storing enough data, to storing redundant data, to allowing data anomalies.

The Relational Database Model was created by Edgar F. Codd specifically to solve those problems. The foundation of the Relational Model is a process Codd dubbed "Normalization". It is the process of grouping related data elements into separate tables and then relating those tables on common data elements.

Since an example is worth a thousand cliches, let's look again at the example from Part I.

Figure 1: Salary History
Figure 1 shows the salary history of a small company. As is, it represents the lowest level of data organization a table must have in a database. Codd called this level the First Normal Form (1NF). (Don't worry, I'm not going to explain all the normal forms.)

Another rule of Normalization says that all of the fields in a table should be about just one "thing". But a look at Salary History reveals that some of the fields are not really about the salary history at all.

How do I know? Well, we can't talk about normalization without discussing primary keys. (See What is a Primary Key for a more detailed discussion.) The way that we can tell if all of the fields are about the same "thing" is by seeing if all the fields in the record depend on the entire primary key.

In Figure 1, we can see that the fields necessary to uniquely identify the record are SS# and SalaryDate (the date on which the salary increase was given.) But do FirstName and StartDate depend on the ENTIRE primary key? No, they do not. FirstName or StartDate do not change if the SalaryDate changes.

Therefore we need to move them to a different table. What table? Well, what are the fields about? In this case, they are about the employee, so we will create an employee table composed of FullName, SS#, and StartDate. SS# is the most obvious choice for a primary key in this example. (Note: Social Security number is NOT usually the best choice for a primary key in real life.)

Do FullName and StartDate depend on SS#? Yes they do. The SS# represents a single, real-life person and that person can have only one name and one start date.

Now we are left with three fields in Salary History: SS#, SalaryDate, and Salary. SS# and SalaryDate remain the primary key. Does Salary depend the entire primary key? Again, yes. Each person can have only one Salary increase on a particular date.

So normalizing the table structure requires two tables: Employee and Salary History, like so:

Figure 2: Employee


Figure 3: Salary History


Of course, now we have to somehow put the data back together. For that, we need Relationships. We'll look at that next time in What is Normalization, Part III.


.

Monday, March 20, 2017

What Is Normalization, Part II

Break it up. Break it up.

In What Is Normalization, Part I, I discussed various ways to represent data in a tabular form. All of those methods have problems -- from not storing enough data, to storing redundant data, to allowing data anomalies.

The Relational Database Model was created by Edgar F. Codd specifically to solve those problems. The foundation of the Relational Model is a process Codd dubbed "Normalization". It is the process of grouping related data elements into separate tables and then relating those tables on common data elements.

Since an example is worth a thousand cliches, let's look again at the example from Part I.

Figure 1: Salary History
Figure 1 shows the salary history of a small company. As is, it represents the lowest level of data organization a table must have in a database. Codd called this level the First Normal Form (1NF). (Don't worry, I'm not going to explain all the normal forms.)

Another rule of Normalization says that all of the fields in a table should be about just one "thing". But a look at Salary History reveals that some of the fields are not really about the salary history at all.

How do I know? Well, we can't talk about normalization without discussing primary keys. (See What is a Primary Key for a more detailed discussion.) The way that we can tell if all of the fields are about the same "thing" is by seeing if all the fields in the record depend on the entire primary key.

In Figure 1, we can see that the fields necessary to uniquely identify the record are SS# and SalaryDate (the date on which the salary increase was given.) But do FirstName and StartDate depend on the ENTIRE primary key? No, they do not. FirstName or StartDate do not change if the SalaryDate changes.

Therefore we need to move them to a different table. What table? Well, what are the fields about? In this case, they are about the employee, so we will create an employee table composed of FullName, SS#, and StartDate. SS# is the most obvious choice for a primary key in this example. (Note: Social Security number is NOT usually the best choice for a primary key in real life.)

Do FullName and StartDate depend on SS#? Yes they do. The SS# represents a single, real-life person and that person can have only one name and one start date.

Now we are left with three fields in Salary History: SS#, SalaryDate, and Salary. SS# and SalaryDate remain the primary key. Does Salary depend the entire primary key? Again, yes. Each person can have only one Salary increase on a particular date.

So normalizing the table structure requires two tables: Employee and Salary History, like so:

Figure 2: Employee


Figure 3: Salary History


Of course, now we have to somehow put the data back together. For that, we need Relationships. We'll look at that next time in What is Normalization, Part III.


.

Monday, March 13, 2017

What Is Normalization, Part I

Why Normalization?

What is normalization? Normalization is a methodology for removing redundant data from a database WITHOUT losing information.

There are many ways to represent data. Some of the most common are: spreadsheets, flat files, and relational databases. Each of these ways have their own advantages and disadvantages.

For instance, in a spreadsheet, it's acceptable to represent the data like this:

Figure 1

This is perfectly readable to a human. It is obvious that the row beneath Gina Fawn also belongs to her. Unfortunately, this is impossible for a database to interpret. The database must have all the information it needs on a single row. It cannot refer to another row to get the information it needs.

One way to correct this, would be to fill in the missing information.

Figure 2

Unfortunately, this requires storing a lot of redundant data. What's the big deal? It's only a couple of fields, right? But that's only in the example shown. What if we were storing all of the demographic data (name, address, phone, city, state, etc.) for a lot of people? This would waste a lot of storage capacity.


But wasted storage is not the worst problem. What if the SSN of Gina Fawn's first record was changed to 215-87-7854? Perhaps this was through operator error or maybe a programmatic update. It doesn't matter, the data has been changed. Now, which SSN is really Gina's? The database has no way of knowing. Worst still, the SSN matches Tony Jones. So, does that SSN represent Gina or Tony? Again, no way to know.

This same problem holds true for all the fields which hold redundant data. This is called a Data Anomaly error. Once you start having data anomalies, you cannot trust the integrity of your database.

One way to solve this problem is with Repeated Columns. This is a common solution in spreadsheets. With repeated columns, the non-redundant information (like row two in Figure 1) are stored as columns. We get something like this:

Figure 3

Now we don't have problems with redundancy, but we have additional problems. First of all, we have to decide how many repeated columns to create. In Figure 3, I only show one salary increase for Gina and Tony, but is that reasonable? What if Gina has five wage increases and Tony had seven? Is seven sets of columns enough? Do I cap it at the largest record? Or do I add more columns to accommodate growth? If so, how many?

Secondly, such a table structure requires a lot of manual modification and becomes untenable when you have a lot of data. Perhaps instead of just the date and salary, we are also storing the job description, pay grade, status, and so forth? The structure would be come so large and unruly that it would be impossible to maintain.

Fortunately, Normalization comes to the rescue. In What Is Normalization, Part II, I'll explain how. Thank Codd.

.

Friday, March 10, 2017

How Do I Decompile a Database?


In order for any program code to be run by a computer, it must be converted to machine-readable code. This code is called Object Code. The text version of this program that you and I can read is called Source Code.

In Access, the process of producing Object Code from Source Code is called "compilation". Whenever code is run for the first time in Access, the code is first compiled. (You can also compile it yourself by pushing the compile button). The only place you can make changes to code is in the Source Code, which must again be compiled into Object Code.

Occasionally, code can be deleted from the Source Code, but for some reason is never removed from the Object Code. This code is never again seen on the screen as text, but is still sitting there somewhere in the database file. This can, at times, interfere with the normal operation of the program.

To remove these stray bits of code, you can "Decompile" your database and then re-compile it. This process removes all of the compiled Object Code, then when you re-compile it, you only get Object Code that reflects the current Source Code.

To Decompile your Access database, do the following:

  1. Click the Start button on the task bar and choose Run
  2. In the Run dialog box, type the following:
    "C:\Program Files\Microsoft Office\Office\Msaccess.exe" /decompile
    where the first part (in quotes) is the complete path to your Access program. If you have the default installation, it is likely that it is just as listed here. Click OK.
    Note: if you have Windows Vista, just type the command line in the Start Search box of the Start menu.
  3. This will open Access and allow you to choose which database to open. Whichever database you open will be decompiled.
  4. Choose a database and open it. Access 97 used to give you a dialog which told you the database had been decompiled, but newer versions do not. Nevertheless, the database has been decompiled.
  5. Open any Module in design view, or any Form or Report in design view and choose View > Code from the menu. In the next screen, choose Debug > Compile from the menu bar.
    Note: In Access 2007, you can also choose Database Tools on the ribbon, then select Visual Basic. Then choose Debug > Compile.
  6. Your database has now been Decompiled and Re-compiled.

Creating a Shortcut:

I use decompile quite frequently, so instead of typing the command line into the Run box, I've created a shortcut on my desktop. There are several ways to create a shortcut, the easiest is to use the Shortcut Wizard.

Right-click anywhere on your desktop and select New > Shortcut. The wizard will give you a dialog box allowing you to browse to the file you want. It should be the same as above in Step 2. Once you've browsed there, add the "/decompile" switch to the end as shown below. Be sure to separate the decompile switch from the file path with a space. It will look as follows:


Click Next.

It will ask to name your shortcut. Choose something descriptive like Decompile Access 2003. And click Finish.

The finished shortcut will look like this:


Now when I want to decompile a database, I just click the shortcut and start with Step 3 above.


.

Tuesday, March 7, 2017

Access 101: How Do I Replace System Error Messages With Custom Messages?

Access provides a number of error messages, but many are confusing to the users. I find it sometimes useful to replace the system error messages with custom error messages. The Null primary key is a case in point.

When you try leave a record on a form without a primary key value, the form will display the following error.

But you might want to replace this with your own error message, identifying the actual field.

At it's simplest, you can use the Form_Error event to trap for this and otherrecord-level errors. Something like this:

Private Sub Form_Error(DataErr As Integer, Response As Integer)

If DataErr = 3058 Then
MsgBox "You must give Account Number a value."
Response = acDataErrContinue
Else 'msgbox DataErr
Response = acDataErrDisplay
End If

End Sub

3058 is the error number for the null primary key. The "Else" section will let any other error display the normal system error message.
If you want to know the number for another error, just un-comment the msgbox and you'll get the error number. Add any additional errors as an ElseIf.

Other errors you can trap:

Limit to List: 2237
Input Mask: 2279
Required field: 3314
Validation Rule: 2107
Bad Data Value: 2113

Wednesday, March 1, 2017

Access 101: How Do I Bypass Start Up Options?

In a previous post, ( How Do I Configure My Database Start Up? ) I discussed how to set Start Up options for Access. But what if you want the database to open without running those options?

Access provides a Bypass Key to do this. Simply hold the Shift Key while opening the database, and the start up options will not run. Depending on your security settings, you might see one or more security messages. You must continue to hold the Shift Key down while you answer the security messages.

It is also possible to disable the Bypass Key if you don't want your users to be able to bypass the options. To do this, set the AllowBypassKey property to False in a macro or VBA module.

Here are some of good, advanced resources:


But be careful. It's possible to lock yourself out if you don't know what you're doing. In general, I discourage novice users from turning off the bypass key until they know more about the implications of doing so. At the very least, test it on a copy of your database.