Wednesday, June 27, 2012

New Sample: Form_DragDropResizeArrange

By AD Tejpal

    This sample db demonstrates various methods for dragging / re-sizing / arranging of controls by user action on access form at run time. In each style, before closing the form, user has the option to save latest status of each control regarding its position and size. On next opening of form, controls will get depicted as per such saved status.

    For a control to become eligible for drag and/or resize action, its tag property should include the words Drag and/or ReSize. At any stage, the user can undo the last action by clicking the UnDo command button. As and when desired, all such controls can be made to revert back to their original design settings by clicking the command button at bottom left. This command button also serves to toggle between design settings and last saved user settings.

    Following alternative styles are demonstrated:

    1 - Style A - It covers the following features:

        1.1 - Drag / re-size by direct use of control's mouse events via class C_ControlsDDR. This class has an interesting feature: WithEvents functionality for different types of controls has been incorporated into a single class.

        1.2 - Grouped controls manipulation. As the user drags the mouse (with left button in pressed state) around a set of controls, a red rectangle gets drawn accordingly and all controls falling within this rectangle are treated as a group. This group of controls can then be subjected to any of the following actions as desired:
            (a) Drag the group of controls.
            (b) Align the grouped controls Left / Right / Top / Bottom.
            (c) ReSize the group as per Widest / Narrowest / Tallest / Shortest control.
            (d) Make horizontal or vertical spacing amongst grouped controls equal.

        1.3 - All controls enclosed by red rectangle drawn by the user (para 1.2 above) become subject to grouped action, even if not carrying the words Drag or ReSize in their tag values. For controls required to be kept immune to grouped action, the tag value should carry the word Exclude.

    2 - Style B:

        2.1 - It demonstrates induced manipulation of controls via drag/re-size handles - without depending upon mouse events of these controls. Three types of controls are illustrated, i.e. web browser, subform and image. Out of these, web browser and subform control do not have any mouse event of their own.

        2.2 - As the user clicks upon the target control, drag handle appears at top left while resize handle appears at bottom right. Dragging on top left handle (with left button pressed) moves the control while similar action on resize handle leads to resizing.

    3 - Style C:

        3.1 - It demonstrates a completely non-intrusive approach towards induced manipulation of controls without depending upon any of their event. While three types of controls are illustrated, i.e. web browser, subform and image, this approach is universally applicable to all types of controls (including lines which do not have any event at all).

        3.2 - As the user clicks anywhere on blank space of detail section near top left of target control, it is accompanied by  automatic detection of that control. In confirmation, a red marker appears at top left of the control. Dragging the mouse (with left button pressed) on form detail section leads to induced movement of control identified by the marker.

        3.3 - Similarly, when the user clicks anywhere on blank space of detail section near bottom right of target control, it is accompanied by  automatic detection of that control. In confirmation, a red marker appears at bottom right of the control. Dragging the mouse (with left button pressed) on form detail section leads to induced resizing of control identified by the marker.

    4 - Style D:

        4.1 - It demonstrates a completely non-intrusive approach towards induced manipulation of lines (lines do not have any event at all).

        4.2 - As the user clicks anywhere on blank space of detail section near top left of target line, it is accompanied by  automatic detection of that line. In confirmation, a red marker appears at top left of the line. Dragging the mouse (with left button pressed) on form detail section leads to induced movement of line identified by the marker.

        4.3 - Similarly, when the user clicks anywhere on blank space of detail section near bottom right of target line, it is accompanied by  automatic detection of that line. In confirmation, a red marker appears at bottom right of the line. Dragging the mouse (with left button pressed) on form detail section leads to induced resizing/rotation of line identified by the marker.

        4.4 - Rotating action on a line can be carried out in either direction (clock-wise or anti clock-wise). There is no limit to the angle of rotation. So long as left mouse button is kept pressed, the user can continue rotating the line in complete circles in either direction.

    Note: It is observed that combo box and list box do not behave in a completely satisfactory manner while being dragged or re-sized through direct use of mouse events of these controls. There is no such problem with induced drag/re-size approach, as demonstrated in styles B to D. Interestingly, styles C & D  are completely non-intrusive and do not depend upon any event of target control.

You can find the sample here: http://www.rogersaccesslibrary.com/forum/Form-dragdropresizearrange_topic596.html

Wednesday, June 13, 2012

Really Bad Design Decisions: A Case Study

By Roger Carlson

Sometimes a single design decision can have a cascade effect, which causes multiple, secondary design errors. One such error, commonly made by novice developers, is to slavishly follow a pre-existing paper form to determine their table design.

Now certainly, when creating a database to replace a paper-based system, it is vitally important to assemble all of the forms and reports used in the system. It is by a careful review of these documents that the developer determines a majority of the fields he or she will need in order to store and report the information in the system. After all, if an input form has a place for Order Date, then the database needs a field to store it. Likewise, if a report displays the Order Quantity, then this information has to be stored in the database somewhere.

But paper forms are not created with proper database design in mind. They are designed to make it easy for humans to fill out. This design can be at odds with established design principles. By blindly following the paper form to determine the database design, the developer can create a system subject to numerous data and relational integrity errors.

Case Study: OB Log

Several years ago, I ran into a database that was the poster child for this error. I was asked to create a database to automate a logbook for the Obstetrics department of a local hospital. Someone in the department, who had some experience with Access, had taken a first pass at creating a database.

Figure 1 shows the main data entry for the application.

Figure 1
Figure 1: Main entry form for the OB Log database

For the purposes of this article, we're going to ignore the numerous application design errors and concentrate on the database design errors because they're much more important and difficult to fix. Nevertheless, I'd be remiss if I didn't at least mention them. They are:

  1. Hideous use of color. With color, less is more.
  2. The controls are scattered about, making the user hunt for them.
  3. The controls are not exactly aligned, giving the form an amateur look.
  4. The labels for the controls are sunken. This confuses the user as to which as to which controls data can be entered in. In general, labels should be flat, text boxes should be sunken, and buttons should be raised. Any variation just confuses users.

But these application design issues pale in comparison to the database design problems.

Base Assumption: Paper Form Design Determines Database Design

All of the problems below stem from a single assumption: the design of the paper form is also the best design for the database. As we will see, this is not the case. Let's look at the problems resulting from this decision.

Problem 1: Single table design

Because all of the information here was on a single paper form, the developer incorrectly assumed it should all go in one table. The first thing I did when I looked at the database was to open the Relationship Window. It was blank. I knew at that point the design was in trouble. There were supplementary tables, a bewildering number, in fact, but they were just look-up tables that had no relationship to the main data table.

For instance, there was a look-up table called "Dilatation," which held the numbers 1 though 10, the acceptable values for the Dilatation field. There was a table called "Hours" which held the number 1 through 12, a table called "ZeroToThree," which held the numbers 0 through 3. There were also look-up tables for the doctors and nurses.

While many of these tables were in fact useful, there were no tables to hold information representing relationships in the data. In other words, the information that was the meat of the application was in a single table. It is rare when all the information of a complex process can be stored in a single table.

Problem 2: Multiple Yes/No fields

The reason complex processes can rarely be stored in a single table is because most of the time, the data contains One-To-Many relationships. For instance, each Delivery can have one or more Induction Indications, i.e. reasons why labor should be induced. On the paper form, these indications were represented like this:

Figure 2
Figure 2: Paper form layout for Induction Indicators

Each delivery event can have multiple reasons for inducing labor. But since you can't put multiple values in a single field, the developer chose to create 8 Yes/No fields, each representing one indication. On the form, they looked like this:

Figure 3
Figure 3: Portion of the application form implementing Induction Indications.

At first blush, this is a reasonable solution. You can easily query the table for one or more of these indications by testing for a Yes value in any of these fields. For instance:

SELECT * FROM OBDelivery WHERE PROM = TRUE OR PIH = TRUE

However, other, more complex queries are difficult or impossible with this design. For instance, what if I wanted a count of the following Indications: Elective Induction, Macrosomia, and Preterm PROM with a result like this:

Figure 4
Figure 4: Complex query counting the number of Induction Indicators.

With the single-table design, I would have to create a complex Union query like this:

SELECT "Elective Induction" AS Induction Indications,
     Count([Elective Induction]) AS [Count Of InductionIndication]
FROM [OB DeliveryOld]
WHERE [Elective Induction]=True
GROUP BY "Elective Induction"
UNION
SELECT "Macrosomia" AS Induction Indications,
     Count([Macrosomia]) AS [Count Of Induction Indication]
FROM [OB DeliveryOld]
WHERE [Elective Induction]=True
GROUP BY "Macrosomia"
UNION
SELECT "Preterm PROM" AS Induction Indications,
     Count([Preterm PROM]) AS [Count Of Induction Indication]
FROM [OB DeliveryOld]
WHERE [Preterm PROM]=True
GROUP BY "Preterm PROM";

If I wanted a count of all the Indications, I would have 8 separate Select statements unioned together, one for each Indication.

However, in a properly designed database, the query to do this would be as simple as:

SELECT InductionIndications,
     Count(II_ID) AS Count Of Induction Indications
FROM RefInductionIndications INNER JOIN tblInductionIndications ON
     RefInductionIndications.II_ID = tblInductionIndications.II_ID
WHERE InductionIndications In
    ("Elective Induction","Macrosomia","Preterm PROM")
GROUP BY InductionIndications;

So what is the proper design? The key to understanding this is to realize that you can group the individual Indication fields as values for a single field in your table. These values can be stored in a separate lookup table called refInductionIndications. If a delivery could only have one of these indications, there would be a simple One-To-Many between the Delivery table (OBDelivery) and the reference table.

However, since each delivery can have one or more induction indication, there is actually a Many-To-Many relationship. Each Delivery can have one or more InductionIndications, and each InductionIndication can be part of one or more Deliveries.

To create this relationship, you create an intersection table (sometimes called a linking table). The only two fields in this table are the primary key fields of each of the other tables. These fields become foreign keys in relationships created with the other two tables. To make sure you don’t get duplicate records in the intersection table, you make the foreign key fields a compound primary key.

In the Relationship Window, the relationship looks like figure 5:

Figure 5
Figure 5: Correct design for storing multiple Induction Indications per Delivery.

Implementing this in the application can be accomplished with a simple subform whose record source is a query based on a join tblnductionIndications and refInductionIndications. In this way, the user can select as many Induction Indications as required. The subform would look something like figure 6:

Figure 6
Figure 6: Subform for Induction Indications for properly designed application.

Another problem of having multiple Yes/No fields to represent grouped data falls under the heading of maintenance. What happens if a new Induction Indication must be added to the database?

With the single-table design, the table design itself would have to be modified, of course, but so would the form (a new check box would have to be added to an already cluttered form), and EVERY query and report using Induction Indications would have to be changed. This would require heavy, developer support.

On the other hand, with the proper database design (Figure 5), adding a new indication is as easy as adding a new value to the lookup table. The users can easily do it themselves through a maintenance form in the application, and it would require no modification of the application at all.

This problem was also repeated in the Vacuum/Forceps Indications and C-Section Indications sections. For each of these, the developer created multiple Yes/No fields when, in fact, a separate table was required.

Problem 3: Complex programming to Overcome Design Flaws

When a developer does not pay enough attention to the initial design, it often requires complex programming to overcome the flaws.

Problem 3a: Multiple Births

The design of the paper form also misled the original developer regarding birth information. He never asked the obvious question, what happens if there are multiple births for a single delivery, i.e. twins or triplets?

The original paper form had only one place for Birth information. When there were multiple births, hospital staff would simply use a second form, only fill out the birth portion, and staple it to the first form. This in itself should have told the developer that there was a One-To-Many relationship between the mother's information and birth information.

Because the developer was stuck with the single-table design, he was forced to create program code to: 1) duplicate all of the Delivery information into a new record, 2) delete the Birth information from the new record, and 3) allow the user to enter the new Birth information for each of the multiple births.

This design resulted in a huge amount of redundant data and opened the door for data integrity errors. If any of the Delivery information changed for a multiple-birth delivery, the users would have to edit multiple records, with the likelihood of data entry errors.

Of course, the correct design is to have a separate Birth table that is related to the Delivery table on the Primary Key field of the Delivery table (BirthTrackingID).

Figure 7
Figure 7: Correct design for modeling multiple Births per Delivery.

The form could be modified to have a subform for holding the Births. In this way, the Delivery information would be held only once, yet the results for each individual birth could be accurately recorded while maintaining a link to the Delivery table.

Problem 3b: Date format

On the paper form, the Birth Date/Time was displayed in a particular format: i.e. "10:15 am Tues. 05/04/99". The developer believed he needed to input the date in that format. Therefore, he created the Date/Time field as a text field and created a complex process involving numerous Toolbars, Macros, and program code to assist the user in entering the data.

Clicking the Date/Time field produced Figure 8.

Figure 8
Figure 8: Series of Toolbars pop-ups when the Date/Time field was entered, which was supposed to assist the user to enter a date into a text field. A simple input mask for a date/time field would have been better.

Problem 4: Copying Paper Form Design for Application Form

This last problem is not so much a database design problem, but an application design problem. While it is I important to follow the flow of the paper form to make data entry as easy and error-free as possible, the developer should not feel constrained to slavishly imitate the layout of the paper form to produce the application form.

As the developer, you should work with the client to develop the data entry form to make it as easy as possible for the users. This may also mean a redesign of the paper form to facilitate entry into the database.

In my case, I redesigned the database form to look like this:

Figure 9
Figure 9: Redesigned Data Entry Form

With each group of related data in its own tab on the bottom. This led to a much cleaner design, which was easier for the data entry people to use. We also redesigned the paper form to facilitate this application form.

Conclusion:

Does this mean the developer should completely ignore the layout of the paper forms? No. Assembling all of the input forms and reports is a vital part of the database development process. They will provide the majority of the fields necessary to the project. But you cannot let them determine your database design. As the developer, that's your job.

In general, when dealing with a paper input form do the following:

  1. Look for logical groupings within the fields on the form. For instance, PROM, Preterm PROM, PIH, and Macrosomia are all Induction Indications.
  2. Look for relationships in your groupings. For instance, each Delivery can have one or more Induction Indications. When one entity (Delivery) in your database can have multiple of another entity (Induction Indications), this tells you there needs to be a separate table for this entity.
  3. When there are a series of check boxes on a form, and they all represent alternatives of the same thing, you should not create multiple Yes/No fields, but instead create a single field where the acceptable values include all the labels of the check boxes. If more than one of these values can be selected, then you need an intersection or linking table to implement the Many-To-Many relationship.
  4. Regardless of the formatting on the form, all dates should be entered as Date/Time fields in the database. You can format the output with the Format function to meet the users needs.
  5. Check with the users of the system to make sure you have accurately modeled the data and relationships of the system. Also work with them to create an application form that is easy to use. You may also have to work with them to redesign the paper form.

The developer has to delicately balance the needs of the users against proper database design techniques. In this way, the developer creates a system that is not only easy for the user, but also ensures accurate data.

On-line Database Sample:

On my website, (http://www.rogersaccesslibrary.com), there is a sample database called ReallyBadDesignDecisions which contains two databases:

  • "ReallyBadDatabase.mdb", which illustrates the problems discussed here,
  • "ReallyBadDatabaseReborn.mdb", which shows how I corrected them.

Wednesday, June 6, 2012

COUNT DISTINCT in Access: Part 5

Comparison of the Methods

SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.

Over the last four posts, I've discussed different ways to simulate it in Access :

  1. Subqueries the FROM Clause
  2. Subqueries in the Field List
  3. User-Defined Function
  4. Crosstab Query (reader submitted method)

Each of these methods have advantages and disadvantages, and as promised, I'll address them here.

Subquery in FROM Clause

The main advantage of creating a subquery in the FROM clause is ease of use, that is, it's the easiest to figure out. It's possible to approach it step-wise by first removing the duplicates from the list to be counted.  See Problem 1 in Subqueries the FROM Clause.

The main disadvantage is lack of flexibility.  As I showed in Problems 2 and 3 in Subqueries the FROM Clause, you can't easily created other levels of aggregation nor simply add a second aggregate to the same level.  The reason is you've pre-limited the values available, so you need to create extra levels of subqueries to compensate.

Over all, this method is useful for simple distinct counts, but not for more complex ones.

Subquery in Field List

The main advantage of create a subquery in the Field List is it's flexibility.  Once you've created the initial query, adding a second aggregate can be added just like in any other aggregate query, that is, just add another field to the field list with an aggregate function.

The main disadvantage is that it's a little harder to figure out in the first place.  The subquery must be a correlated subquery, which is conceptually more difficult.  A correlated subquery is evaluated for each row in the main query, so it must be tied to the main query, and that's a more difficult concept.

This method is useful for more complex queries.  It also more closely simulates the T-SQL Count Distinct, which also works at the Field List level.  This means that if you need to upsize this query to T-SQL, it's as simple as replacing the subquery with the COUNT DISTINCT.

User-Defined Function

One advantage of with user-defined function is that once created, you don't have to figure out how to do a subquery for each query.  You can simply call the function and send in the appropriate values.  It also appears to perform fairly well, but I'll address performance below.

The main disadvantage is that it's fairly easy to produce an incorrect result. The Where and Group By arguments must match the main Where and Group By clauses of the main query or the value will be be incorrect.  However, In an extremely complex query, this method may be useful to reduce the level of complexity in the main query.

Crosstab Query Method

If you're familiar with crosstab queries, this method is nice and clean.  And it performs very well (see below).

But as I've noted before, the main problem is that you can't add additional aggregates.  For instance, you can't show the sum of one field and the count distinct of another in the same query.  This makes it more limited than the other methods.

Performance

First of all, it's silly to talk about performance without discussing indexes.  On non-indexed fields, each of the methods will perform much worse.  In this case, I indexed all of the fields involved in the aggregation: OrderID (primary key), Customer, and Order_Date.

When each of the methods were run against the sample data listed, they all ran nearly instantaneously.  That table consists of 4 customers and 3 days, totaling 22 records.

To test the performance, I created a table consisting of 19 customers over the course of 30 days totaling 113,000 rows.  Then I wrote a subroutine that opens each query, recording the time when it opens and when the query completes.  The code looks like this:

Sub test()
Dim starttime As Date
Dim endtime As Date

'test From
starttime = Now
DoCmd.OpenQuery "TestFROMLarge"
endtime = Now
Debug.Print "TestFROMLarge: " & DateDiff("s", starttime, endtime)
DoCmd.Close acQuery, "TestFROMLarge"

'test UDF
starttime = Now
DoCmd.OpenQuery "TestUDF_Large"
endtime = Now
Debug.Print "TestUDF_Large: " & DateDiff("s", starttime, endtime)
DoCmd.Close acQuery, "TestUDF_Large"

'test Xtab
starttime = Now
DoCmd.OpenQuery "XTab_Prob2_Large"
endtime = Now
Debug.Print "XTab_Prob2_Large: " & DateDiff("s", starttime, endtime)
DoCmd.Close acQuery, "XTab_Prob2_Large"

'test FieldList
starttime = Now
DoCmd.OpenQuery "TestFieldListLarge"
endtime = Now
Debug.Print "TestFieldListLarge: " & DateDiff("s", starttime, endtime)
DoCmd.Close acQuery, "TestFieldListLarge"

End Sub

Running this code against the 100K table produced this:

TestFROMLarge: 1
TestUDF_Large: 1
XTab_Prob2_Large: 1
TestFieldListLarge: 8

That gives me some information, but not enough.  So I created an even larger file.  Still using the same 19 customers, but with data spanning a whole year.  This new table had nearly 1 million records.

Running my test code against the 1M table produced this:

TestFROMLarge: 10
TestUDF_Large: 4
XTab_Prob2_Large: 6
TestFieldListLarge: 74

This file gives me the granularity to see differences.  The first three still execute within similar time frames.  The Field List method takes nearly 8 times longer.

Surprisingly, the User Defined Function performs the best of all of them.  My expectation would have been that it was the slowest.  Also surprisingly, the Field List method was the slowest.  I would have thought that a correlated subquery would execute faster.

If you'd like to test this for yourself, I've bundled this whole series with the database into a sample available my my website: http://www.rogersaccesslibrary.com/forum/countdistinctmdb-intermediate_topic595.html

Because of size considerations, it does not have the 1M table, so you'd have to create that yourself.

Monday, June 4, 2012

Count Distinct In Access: Part 4

Crosstab Method (Reader Submitted)

Note: In the comments section of Part 3: User-Defined Function, Patrick mentioned another method that uses a Crosstab Query.  Even though I didn't develop it, I'm including it in this series for completeness.  So, thank you, Patrick. 

SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.

What does COUNT DISTINCT do?  Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates.  For instance, given the following table, how many distinct customers have orders?

ORDERS

OrderID

OrderDate

Customer

Amount

1

1/17/2008

Ajax Inc.

$310.00

2

1/17/2008

Ajax Inc.

$510.50

3

1/17/2008

Ajax Inc.

$311.00

4

1/17/2008

Baker Corp.

$5,144.00

5

1/17/2008

Baker Corp.

$61.00

6

1/17/2008

Baker Corp.

$110.50

7

1/17/2008

Baker Corp.

$11.00

8

1/17/2008

Crystal & Co.

$111.85

9

1/17/2008

Crystal & Co.

$511.00

10

1/18/2008

Baker Corp.

$711.95

11

1/18/2008

Baker Corp.

$810.00

12

1/18/2008

Baker Corp.

$310.59

13

1/18/2008

Crystal & Co.

$311.00

14

1/18/2008

Crystal & Co.

$811.50

15

1/18/2008

Ajax Inc.

$512.00

16

1/18/2008

D&D LLC

$211.00

17

1/18/2008

D&D LLC

$3,311.50

18

1/19/2008

Ajax Inc.

$410.00

19

1/19/2008

Ajax Inc.

$610.50

20

1/19/2008

Baker Corp.

$4,411.00

21

1/19/2008

Baker Corp.

$511.50

22

1/19/2008

Baker Corp.

$611.50

In SQL Server, I can do this:

SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders

Which will give me the following:

CountOfCustomer

4

In Access, if I use the Distinct predicate with the count:

SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;

I get:

CountOfCustomer

22

Since Access SQL does not have the Count Distinct function, what can I do? 

There are actually four different methods for simulating the Count Distinct:

  1. Subqueries the FROM Clause
  2. Subqueries in the Field List
  3. User-Defined Function
  4. Crosstab Query (this post)

Each of these methods have advantages and disadvantages, and I'll address each in turn. 

 Crosstab Query Method

A crosstab query presents aggregated data in an easy-to-understand grid. 

To create a simple Count Distinct as above, use the following SQL:

TRANSFORM Count(*) AS Cell
SELECT Count(cell) AS CountOfCustomer
FROM Orders
GROUP BY "Anything"
PIVOT Customer In (null);

To get the following result:

CountOfCustomer

<>

4

 

More Complex Queries

The Crosstab can also be modified to produce a customer count grouped by the OrderDate:

TRANSFORM Count(*) AS Cell
SELECT OrderDate, Count(cell) AS CountOfCustomer
FROM Orders
GROUP BY OrderDate
PIVOT Customer In (null);

OrderDate

CountOfCustomer

<>

1/17/2008

3

 

1/18/2008

4

 

1/19/2008

2

 

Problem: Additional Aggregation

However, in the other methods I've discussed, I was able to add additional aggregates to the query like this:

OrderDate

CountOfCustomer

CountOfOrderID

SumOfAmount

1/17/2008

3

9

$7,080.85

1/18/2008

4

8

$6,989.54

1/19/2008

2

5

$6,554.50

I can't figure out how to do that with the Crosstab query, so I'd say this method, although fast, is somewhat limited in terms of flexibility. 

Patrick insists this method has MUCH better performance than the UDF method.  Next time, I'll look at each of the methods and discuss their pros and cons, so we'll see.