Monday, October 10, 2011

How do I calculate a Median in Access? Part 2

The DMedian() Function:

In my last post, (How do I calculate a Median in Access? Part 1) I defined Median and described in general terms how to calculate a median in Access. This time, I'm going to describe in detail the DMedian() user-defined function.

First of all, you have to create a Public Function in a General Module. Create a new module by selecting Module in the Database Window and click the New button. Immediately click the Save button and name the module "basDMedian". Next, type the following function into the module.

Public Function DMedian(FieldName As String, _
      TableName As String, _
      Optional Criteria As Variant) As Variant

' Created by Roger J. Carlson
'
http://www.rogersaccesslibrary.com
'
Terms of use: You may use this function in any application, but
' it must include this notice.

'Returns the median of a given field in a given table.
'Returns -1 if no recordset is created

' You use this function much like the built-in Domain functions
' (DLookUp, DMax, and so on). That is, you must provide the
' 1) field name, 2) table name, and 3) a 'Where' Criteria.
' When used in an aggregate query, you MUST add each field
' in the GROUP BY clause into the into the Where Criteria
' of this function.

' See Help for more on Domain Aggregate functions.

On Error GoTo Err_Median

    Dim db As DAO.Database
    Dim rs As DAO.Recordset
    Dim strSQL As String
    Dim RowCount As Long
    Dim LowMedian As Double, HighMedian As Double

    'Open a recordset on the table.
    Set db = CurrentDb
    strSQL = "SELECT " & FieldName & " FROM " & TableName
    If Not IsMissing(Criteria) Then
        strSQL = strSQL & " WHERE " & Criteria & " ORDER BY " & FieldName
    Else
        strSQL = strSQL & " ORDER BY " & FieldName
    End If
    Set rs = db.OpenRecordset(strSQL, dbOpenDynaset)

    'Find the number of rows in the table.
    rs.MoveLast
    RowCount = rs.RecordCount
    rs.MoveFirst
   
    'Determine Even or Odd
    If RowCount Mod 2 = 0 Then
        'There is an even number of records. Determine the low and high
        'values in the middle and average them.
        rs.Move Int(RowCount / 2) - 1
        LowMedian = rs(FieldName)
        rs.Move 1
        HighMedian = rs(FieldName)
        'Return Median
        DMedian = (LowMedian + HighMedian) / 2
    Else
        'There is an odd number of records. Return the value exactly in
        'the middle.
        rs.Move Int(RowCount / 2)
        'Return Median
        DMedian = rs(FieldName)
    End If

Exit_Median:
    'close recordset
    rs.Close
    Exit Function

Err_Median:
    If Err.Number = 3075 Then
        DMedian = 0
        Resume Exit_Median
    ElseIf Err.Number = 3021 Then
        'EOF or BOF ie no recordset created
        DMedian = -1
        Resume Exit_Median
    Else
        MsgBox Err.Description
        Resume Exit_Median
    End If
End Function

To use this function in a query, simply use the query above that calculates the average with the DAvg, and replace DAvg with DMedian, like this:

SELECT Hospital, DMedian("TimeToProc", "tblTimeToProcedure",
    "[Hospital] = '" & Hospital &"'") AS MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital

Which will return this:

Hospital

MedianTimeToProc

A

2.60

B

3.5

Caveats:

You have to be careful, however, because it is very easy to produce data that looks correct, but is not. If you're not careful to include every field contained in the Where clause and Group By clause of your main query in the Criteria of the DMedian function, your data will not be correct.

Let's look at a slightly more complex example. Suppose we want to aggregate our TimeToProc data at the Hospital and Unit level. We might have a table that looks like this:

tblTimeToProcedure

PatientID

Hospital

Unit

TimeToProc

1

A

C1

1.5

2

A

C2

2

3

A

C1

2.2

4

A

C2

3

5

A

C1

2

6

A

C1

4

7

A

C2

5

8

A

C2

100

9

B

A1

2.5

10

B

A1

2

11

B

A1

4.2

12

B

A2

3

13

B

A2

2.2

14

B

A2

4

15

B

A1

5

16

B

A1

6

Suppose I create a totals query in Access that aggregates the data on Hospital and Unit, but only provide the Hospital to the DMedian function's criteria. Like this:

SELECT Hospital, Unit, DMedian("TimeToProc","tblTimeToProcedure",
     "[Hospital] = '" & [Hospital] & "'") AS MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;

I'll get a result that looks like this:

Hospital

Unit

MedianTimeToProc

A

C1

2.60

A

C2

2.60

B

A1

3.5

B

A2

3.5

This is obviously wrong. The DMedian function is aggregating ONLY at the Hospital level, while the main query is aggregating at both Hospital and Unit.

In order to produce the correct median values, we have to supply both the Hospital and the Unit to the DMedian function.

DMedian("TimeToProc", "tblTimeToProcedure", "[Hospital] = '" & Hospital &"' AND [Unit] = '" & Unit & "'")

The finished query will look like this:

SELECT Hospital, Unit, DMedian("TimeToProc", "tblTimeToProcedure",
     "[Hospital] = '" & Hospital &"'AND [Unit] = '" & Unit & "'") AS MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;

This will produce the correct results:

Hospital

Unit

MedianTimeToProc

A

C1

2.10

A

C2

4

B

A1

4.2

B

A2

3

So the rule is that every field that participates in the Where clause or Group By clause of the main aggregate query must also be represented in the Criteria argument of the domain aggregate function.

Making Sure Your Query is Correct

One way to make sure you are aggregating the data correctly in both the main query and the domain aggregate function is to create an aggregate query using the SQL AVG function, and then create the same query using the DAvg domain aggregate function. If the two queries produce the same values, you know you have created it correctly. Then just replace DAvg with your DMedian function.

For instance, first create an AVERAGE aggregate query using the SQL AVG function. The SQL statement would look like this:

SELECT Hospital, Unit, AVG(TimeToProc) AS AvgerageTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit

Hospital

Unit

AverageTimeToProc

A

C1

2.43

A

C2

27.50

B

A1

3.94

B

A2

3.07

Now create the same query, but this time use the DAvg function.

SELECT Hospital, Unit, DAvg("TimeToProc", "tblTimeToProcedure",
    "[Hospital] = '" & Hospital &"' AND [Unit] = '" & Unit & "'") AS
    AverageTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;

Hospital

Unit

AverageTimeToProc

A

C1

2.43

A

C2

27.50

B

A1

3.94

B

A2

3.07

Since the two queries produce identical results, we know that we have set up the domain function correctly. Now just replace DAvg with your DMedian function:

SELECT Hospital, Unit, DMedian("TimeToProc", "tblTimeToProcedure",
     "[Hospital] = '" & Hospital &"' AND [Unit] = '" & Unit & "'") AS
     MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;

Hospital

Unit

MedianTimeToProc

A

C1

2.10

A

C2

4

B

A1

4.2

B

A2

3

On-line Database Sample:

On my website, (http://www.rogersaccesslibrary.com), there is a small sample database called Median.mdb. This sample has the above example and another, more complex, example in an Access database, so you can see how the DMedian function works.

There are two versions of the DMedian function. DMedian is in the basMedianADO module, and DMedian is in the basMedianDAO. You must use the DMedian97 in an Access 97 database.

Conclusion:

Because Microsoft Access® does not have a Median aggregate function, finding the median value of a group of data can be a challenge. But with a little programming expertise, you can create your own DMedian domain-aggregate function that you can use to find the Median value for an aggregate query.

3 comments:

Mike Schneider said...

I am a novice Access user and have been searching for an easy to understand method to calculate medians in Access for some time now. I was ecstatic when I found your blog on the subject. Your explanation and examples worked perfectly. Thank you for posting this!!! Any suggestions on speeding up the process for large data-sets with multiple sub-groupings?

Anonymous said...

Thank you very much! You solved my problem I had for weeks.

Anonymous said...

Can I just say thank you so much! I didn't realize this was going to be such a difficult thing. Your code worked perfectly!