Tuesday, June 6, 2017

Entity-Relationship Diagramming: Part II

In my last post, Entity-Relationships Part I, I approached the idea of normalization from a different point of view. Rather than breaking a large table into smaller, normalized tables, the Entity-Relationship process starts with the individual pieces and builds it up into a properly designed data model. It uses objects (Entities and their Attributes) and looks at the real-world Relationships that exist between them. The end result of this is an Entity-Relationship Diagram.

I started the process by writing a short narrative of the database requirements and grouping all the nouns into related categories. A few could not be assigned. Figure 1 shows the resultant grid.

Figure 1: Entity-Attribute Grid

At this point, I should say that this is not a strictly linear process . That is, you can't always move smoothly from one step to the next. Sometimes you have to move back and forth between them as you discover more things about your system.

That's what I'm going to do next. Because I have an unassigned attribute, I'm going to look at the relationships between my existing entities and see if something doesn't present itself.

To look at the relationships, I'm going to ignore the attributes for a while. Attributes do not have relationships, only entities do. If you discover that an attribute does have a relationship with some other entity or attribute, that's an indication that it is really an entity and your grid must change.

First, I'll determine the relationships between existing entities. In What Is Normalization, Part IV , I describe the different types of relationships: One-to-One (1:1), One-to-Many (1:M), and Many-to-Many (M:M). Then I'll diagram them in pairs.

The diagramming technique I'll use is one of the simplest. Entities are represented by boxes . Relationships between entities are represented by lines. "Crow's feet" show a "many-side" relationship and a vertical bar shows a "one-side" relationship. Figure 2 shows how the relationship types are represented.

Figure 2: Relationship Types

Many-to-Many: Common in real life, but cannot be represented in a database.
One-to-Many: The most common relationship in a database.

One-to-One: Seldom used.


So how do I know what the relationships are for my Employee Database? For that I need to go back to the narrative. The second paragraph describes "business rules", that is, how the business actually works. I'll repeat the paragraph here.

An employee can have multiple jobs over time, (ie, Analyst, Sr. Analyst, QA Administrator). Employees can also earn certifications necessary for their job.

From this I can write out the relationships in full sentences, and I find it useful to write them in both directions. For instance, from the narrative, I can say:

Employee-JobHistory
Each Employee can have One or More Job History instance
and
Each Job History instance can be for One and Only One Employee.

This is a classic One-to-Many relationship and we will diagram it as follows:

Job-Job History
Each Job can have One or More Job History instance
and
Each Job History instance can be for One and Only One Job.

Another One-to-Many relationship diagrammed as follows: Employee-Certifications
Each Employee can attain One or More Certifications
and
Each Certification can be earned by One or More Employees

This is a Many-to-Many relationship, and this will need additional work, but I'll leave it for now.
Job-Certifications
It might seem that there is no relationship between Job and Certifications, except for the word "necessary" in the narrative. In going back to the client, I discovered that certain certifications are necessary in order to be promoted to some jobs. Thus there IS a relationship as follows:

Each Job requires One or More Certification
but
Each Certification is for One and Only One Job

Another a One-To-Many relationship Each certification is only for a single job. It is important to verify this with the client. If a certification were required for many different jobs, the model would be different, but in this case it is for only one job. The certification requirements for Analyst I are different than for Analyst II.

These sentences are extremely useful when communicating with your client (or even if you're developing the database for yourself). If the client can say that each of the sentences is strictly true, then you'll know you've designed it correctly. If any of the sentences are wrong, you have to go back and re-evaluate your design.

Unfortunately, as I said in What Is Normalization, Part V , many-to-many relationships cannot be represented directly in a relational database. So I'm going to have to add some intersection (or "linking") tables.

I'll take care of that next in Entity-Relationship Diagramming: Part III

.

No comments: