Friday, October 7, 2011

How do I calculate a Median in Access?

Users of Microsoft Excel® are familiar with the MEDIAN function. Select a range of cells, push the function button, select MEDIAN under statistical functions, and you have your median value. However, no such function exists in Microsoft Access®. So if you need to figure a median in Access, what do you do? One solution is to create your own Median domain aggregate function or DMedian().

But to understand how to program a median function, you must first understand how a median is calculated.

Calculating a Median:

Almost everyone knows what an average is. Add up all the values in your list and divide that by the number of values. In general, it looks like this: (a1+a2+...+an)/n. In most cases, this gives a pretty good aggregate picture of your data. However, if you have values that fall way out of the general range of the others, it can give you a skewed picture.

This is where the median becomes important. A median works differently. To find the median, you sort your list (ascending or descending, makes no difference), count the number of values in your list, and then divide that number by two. If the list has an odd number of values, the result will be a fraction (n.5). Take the integer portion of the resultant (n) and go to the (n+1)th value in your sorted list. That number becomes the median. If the list has an even number of values, the result will be a whole number (n). To figure a median with an even number of values, you take the integer portion of the resultant, (n) and average the nth and (n+1)th values.

Example - Odd number of values:

1.5

2

2.2

3

2

4

5

Average: (1.5+2+2.2+3+2+4+5)/7 = 2.814286.

Median: First, order the list (1.5, 2, 2, 2.2, 3, 4, 5). Counting the number of values returns 7. Divide 7 by 2 and you get 3.5. Taking the integer portion of 3.5 yields 3. Go to the (3+1)th (or 4th) value in the ordered list and you have your median of 2.2.

Example - Even number of values:

1.5

2

2.2

3

2

4

5

1

Average: (1.5+2+2.2+3+2+4+5+1)/8 = 2.5875.

Median: Again, order the list (1, 1.5, 2, 2, 2.2, 3, 4, 5). There are 8 values in the list. Dividing 8 by 2 equals 4. Select the 4th and (4+1)th (that is 5th) value in the ordered list results in 2 and 2.2. Averaging these numbers results in your median value of 2.1.

As you can see, in both cases, the values of the average and median are fairly close. In the Odd case: 2.814286 vs. 2.2. In the Even case: 2.5875 vs. 2.1. But what if we had an outlier; say 100 (in place of 5), in the list?

1.5

2

2.2

3

2

4

100

1

The median stays at 2.1, but the average jumps to 14.4625. Not very representative of your data. So there are times when it is statistically more accurate to use the median rather than the average.

Calculating the Median in Access

But the question remains: how do you create a median in Microsoft Access? One answer is to create your own domain aggregate function: DMedian().

So what are domain aggregate functions? Access has a number of built-in domain aggregate functions that work similarly to the SQL aggregate functions: MAX, MIN, SUM, COUNT, AVG (and others). They are DMax(), DMin, DSum, DCount, and DAvg. These functions act like mini totals (aggregate) queries that return a single value. You can use these functions as the source for a control on a form or report, or you can use them in a query. For more information on Domain functions, see my blog series: Domain Functions Demystified.

There are three arguments you must provide to a domain function. Two are required: Fieldname, Tablename, and the third is optional: a Criteria (essentially a Where clause without the word WHERE). The domain function will then calculate the desired function (min, max, etc.) for that Field, against that Table, filtering it on the Criteria. In general: Dfunction("fieldname", "tablename", "criteria").

Before we look at creating our own DMedian() function, let's look at how the DAvg() function works.

Suppose we have the following table:

tblTimeToProcedure

PatientID

Hospital

TimeToProc

1

A

1.5

2

A

2

3

A

2.2

4

A

3

5

A

2

6

A

4

7

A

5

8

A

100

9

B

2.5

10

B

2

11

B

4.2

12

B

3

13

B

2.2

14

B

4

15

B

5

16

B

6

(TimeToProc is the number of hours between patient admission and the time a give procedure was administered.)

To calculate the average TimeToProc for the entire table with the DAvg function, you would provide the Field and Table without a Criteria. Like this:

DAvg("TimeToProc", "tblTimeToProcedure")

In this case, no Criteria required because we don't want to filter the data. However, suppose we wanted just the average of Hospital A? In that case, we would do this:

DAvg("TimeToProc", "tblTimeToProcedure", "[Hospital]='A'")

(Because Hospital is text data, you have to provide quotes (or single quotes) around it. Numeric data does not require delimiters. For more information on this, look in the Access Help at the DMin, DMax aggregate functions. Type DMin in the keywords.)

You can also use the DAvg function in a Totals Query in Access to return the average for each hospital. Like this:

SELECT Hospital, DAvg("TimeToProc", "tblTimeToProcedure",
"[Hospital] = '" & Hospital &"'") AS AverageTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital

Which will return this:

Hospital

AverageTimeToProc

A

14.9625

B

3.6125

The obvious question, of course, is why not just use the SQL aggregate function (AVG) to do the same thing:

SELECT Hospital, AVG(Hospital) AS AverageTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital

That's a good question, and in fact, it would be better to us the SQL function. But what if Access didn't have an AVG function? This is the case with the MEDIAN. No such function exists, so we have to create our own DMedian function to replace it.

Next time, I'll look at the details of the DMedian() function.

No comments: