The DMedian() Function:
In my last post, (How do I calculate a Median in Access? Part 1) I defined Median and described in general terms how to calculate a median in Access. This time, I'm going to describe in detail the DMedian() user-defined function.
First of all, you have to create a Public Function in a General Module. Create a new module by selecting Module in the Database Window and click the New button. Immediately click the Save button and name the module "basDMedian". Next, type the following function into the module.
Public Function DMedian(FieldName As String, _
TableName As String, _
Optional Criteria As Variant) As Variant
' Created by Roger J. Carlson
' http://www.rogersaccesslibrary.com
' Terms of use: You may use this function in any application, but
' it must include this notice.
'Returns the median of a given field in a given table.
'Returns -1 if no recordset is created
' You use this function much like the built-in Domain functions
' (DLookUp, DMax, and so on). That is, you must provide the
' 1) field name, 2) table name, and 3) a 'Where' Criteria.
' When used in an aggregate query, you MUST add each field
' in the GROUP BY clause into the into the Where Criteria
' of this function.
' See Help for more on Domain Aggregate functions.
On Error GoTo Err_Median
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim strSQL As String
Dim RowCount As Long
Dim LowMedian As Double, HighMedian As Double
'Open a recordset on the table.
Set db = CurrentDb
strSQL = "SELECT " & FieldName & " FROM " & TableName
If Not IsMissing(Criteria) Then
strSQL = strSQL & " WHERE " & Criteria & " ORDER BY " & FieldName
Else
strSQL = strSQL & " ORDER BY " & FieldName
End If
Set rs = db.OpenRecordset(strSQL, dbOpenDynaset)
'Find the number of rows in the table.
rs.MoveLast
RowCount = rs.RecordCount
rs.MoveFirst
'Determine Even or Odd
If RowCount Mod 2 = 0 Then
'There is an even number of records. Determine the low and high
'values in the middle and average them.
rs.Move Int(RowCount / 2) - 1
LowMedian = rs(FieldName)
rs.Move 1
HighMedian = rs(FieldName)
'Return Median
DMedian = (LowMedian + HighMedian) / 2
Else
'There is an odd number of records. Return the value exactly in
'the middle.
rs.Move Int(RowCount / 2)
'Return Median
DMedian = rs(FieldName)
End If
Exit_Median:
'close recordset
rs.Close
Exit Function
Err_Median:
If Err.Number = 3075 Then
DMedian = 0
Resume Exit_Median
ElseIf Err.Number = 3021 Then
'EOF or BOF ie no recordset created
DMedian = -1
Resume Exit_Median
Else
MsgBox Err.Description
Resume Exit_Median
End If
End Function
To use this function in a query, simply use the query above that calculates the average with the DAvg, and replace DAvg with DMedian, like this:
SELECT Hospital, DMedian("TimeToProc", "tblTimeToProcedure",
"[Hospital] = '" & Hospital &"'") AS MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital
Which will return this:
Hospital | MedianTimeToProc |
A | 2.60 |
B | 3.5 |
Caveats:
You have to be careful, however, because it is very easy to produce data that looks correct, but is not. If you're not careful to include every field contained in the Where clause and Group By clause of your main query in the Criteria of the DMedian function, your data will not be correct.
Let's look at a slightly more complex example. Suppose we want to aggregate our TimeToProc data at the Hospital and Unit level. We might have a table that looks like this:
tblTimeToProcedure
PatientID | Hospital | Unit | TimeToProc |
1 | A | C1 | 1.5 |
2 | A | C2 | 2 |
3 | A | C1 | 2.2 |
4 | A | C2 | 3 |
5 | A | C1 | 2 |
6 | A | C1 | 4 |
7 | A | C2 | 5 |
8 | A | C2 | 100 |
9 | B | A1 | 2.5 |
10 | B | A1 | 2 |
11 | B | A1 | 4.2 |
12 | B | A2 | 3 |
13 | B | A2 | 2.2 |
14 | B | A2 | 4 |
15 | B | A1 | 5 |
16 | B | A1 | 6 |
Suppose I create a totals query in Access that aggregates the data on Hospital and Unit, but only provide the Hospital to the DMedian function's criteria. Like this:
SELECT Hospital, Unit, DMedian("TimeToProc","tblTimeToProcedure",
"[Hospital] = '" & [Hospital] & "'") AS MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;
I'll get a result that looks like this:
Hospital | Unit | MedianTimeToProc |
A | C1 | 2.60 |
A | C2 | 2.60 |
B | A1 | 3.5 |
B | A2 | 3.5 |
This is obviously wrong. The DMedian function is aggregating ONLY at the Hospital level, while the main query is aggregating at both Hospital and Unit.
In order to produce the correct median values, we have to supply both the Hospital and the Unit to the DMedian function.
DMedian("TimeToProc", "tblTimeToProcedure", "[Hospital] = '" & Hospital &"' AND [Unit] = '" & Unit & "'")
The finished query will look like this:
SELECT Hospital, Unit, DMedian("TimeToProc", "tblTimeToProcedure",
"[Hospital] = '" & Hospital &"'AND [Unit] = '" & Unit & "'") AS MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;
This will produce the correct results:
Hospital | Unit | MedianTimeToProc |
A | C1 | 2.10 |
A | C2 | 4 |
B | A1 | 4.2 |
B | A2 | 3 |
So the rule is that every field that participates in the Where clause or Group By clause of the main aggregate query must also be represented in the Criteria argument of the domain aggregate function.
Making Sure Your Query is Correct
One way to make sure you are aggregating the data correctly in both the main query and the domain aggregate function is to create an aggregate query using the SQL AVG function, and then create the same query using the DAvg domain aggregate function. If the two queries produce the same values, you know you have created it correctly. Then just replace DAvg with your DMedian function.
For instance, first create an AVERAGE aggregate query using the SQL AVG function. The SQL statement would look like this:
SELECT Hospital, Unit, AVG(TimeToProc) AS AvgerageTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit
Hospital | Unit | AverageTimeToProc |
A | C1 | 2.43 |
A | C2 | 27.50 |
B | A1 | 3.94 |
B | A2 | 3.07 |
Now create the same query, but this time use the DAvg function.
SELECT Hospital, Unit, DAvg("TimeToProc", "tblTimeToProcedure",
"[Hospital] = '" & Hospital &"' AND [Unit] = '" & Unit & "'") AS
AverageTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;
Hospital | Unit | AverageTimeToProc |
A | C1 | 2.43 |
A | C2 | 27.50 |
B | A1 | 3.94 |
B | A2 | 3.07 |
Since the two queries produce identical results, we know that we have set up the domain function correctly. Now just replace DAvg with your DMedian function:
SELECT Hospital, Unit, DMedian("TimeToProc", "tblTimeToProcedure",
"[Hospital] = '" & Hospital &"' AND [Unit] = '" & Unit & "'") AS
MedianTimeToProc
FROM tblTimeToProcedure
GROUP BY Hospital, Unit;
Hospital | Unit | MedianTimeToProc |
A | C1 | 2.10 |
A | C2 | 4 |
B | A1 | 4.2 |
B | A2 | 3 |
On-line Database Sample:
On my website, (http://www.rogersaccesslibrary.com), there is a small sample database called Median.mdb. This sample has the above example and another, more complex, example in an Access database, so you can see how the DMedian function works.
There are two versions of the DMedian function. DMedian is in the basMedianADO module, and DMedian is in the basMedianDAO. You must use the DMedian97 in an Access 97 database.
Conclusion:
Because Microsoft Access® does not have a Median aggregate function, finding the median value of a group of data can be a challenge. But with a little programming expertise, you can create your own DMedian domain-aggregate function that you can use to find the Median value for an aggregate query.
3 comments:
I am a novice Access user and have been searching for an easy to understand method to calculate medians in Access for some time now. I was ecstatic when I found your blog on the subject. Your explanation and examples worked perfectly. Thank you for posting this!!! Any suggestions on speeding up the process for large data-sets with multiple sub-groupings?
Thank you very much! You solved my problem I had for weeks.
Can I just say thank you so much! I didn't realize this was going to be such a difficult thing. Your code worked perfectly!
Post a Comment