SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.
What does COUNT DISTINCT do? Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates. For instance, given the following table, how many distinct customers have orders?
ORDERS
OrderID | OrderDate | Customer | Amount |
1 | 1/17/2008 | Ajax Inc. | $310.00 |
2 | 1/17/2008 | Ajax Inc. | $510.50 |
3 | 1/17/2008 | Ajax Inc. | $311.00 |
4 | 1/17/2008 | Baker Corp. | $5,144.00 |
5 | 1/17/2008 | Baker Corp. | $61.00 |
6 | 1/17/2008 | Baker Corp. | $110.50 |
7 | 1/17/2008 | Baker Corp. | $11.00 |
8 | 1/17/2008 | Crystal & Co. | $111.85 |
9 | 1/17/2008 | Crystal & Co. | $511.00 |
10 | 1/18/2008 | Baker Corp. | $711.95 |
11 | 1/18/2008 | Baker Corp. | $810.00 |
12 | 1/18/2008 | Baker Corp. | $310.59 |
13 | 1/18/2008 | Crystal & Co. | $311.00 |
14 | 1/18/2008 | Crystal & Co. | $811.50 |
15 | 1/18/2008 | Ajax Inc. | $512.00 |
16 | 1/18/2008 | D&D LLC | $211.00 |
17 | 1/18/2008 | D&D LLC | $3,311.50 |
18 | 1/19/2008 | Ajax Inc. | $410.00 |
19 | 1/19/2008 | Ajax Inc. | $610.50 |
20 | 1/19/2008 | Baker Corp. | $4,411.00 |
21 | 1/19/2008 | Baker Corp. | $511.50 |
22 | 1/19/2008 | Baker Corp. | $611.50 |
In SQL Server, I can do this:
SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders
Which will give me the following:
CountOfCustomer |
4 |
In Access, if I use the Distinct predicate with the count:
SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;
I get:
CountOfCustomer |
22 |
Since Access SQL does not have the Count Distinct function, what can I do?
There are actually three different methods for simulating the Count Distinct:
- Subqueries in the FROM Clause
- Subqueries in the Field List
- User-Defined Function (this post)
- Crosstab Query (reader submitted method)
User-Defined Function
A user-defined function is simply a piece of VBA code that you can call from other code or from a query. In this case, I'll call it from a query.
This process works pretty much like the built-in Domain functions like DCount, DMax, DLookup, etc. If you're unfamiliar with Domain Functions, you can read more in my post: Domain Functions Demystified.
Into the function, I send the name of a field, a table (or query), and a Where clause (without the WHERE keyword), In addition, I also need to send the Group By clause (again without the GROUP BY keyword).
Then I take the query created in Subqueries in the Field List and build it in code with the addition of a WHERE clause that restricts the query to a single row matching the row being evaluated in the query. The evaluated query returns a single value.
Lastly, I open a recordset and return the value from the "countof" column from the function.
First of all, create the following Function in a General Module:
Function DistinctCount(FieldName As String, _
TableName As String, _
WhereClause As String, _
GroupByClause As String) As Long
'declare variablesDim db As DAO.Database
Dim rs As DAO.Recordset
Dim strSQL As String
On Error GoTo DistinctCount_Error
'set database variable to current databaseSet db = CurrentDb
'create SQL string with function argument valuesstrSQL = "SELECT Count(Temp." & FieldName & ") as countof " & _
" FROM (SELECT " & GroupByClause & ", Customer FROM " & _
TableName & " GROUP BY " & GroupByClause & ", " & _
FieldName & " ORDER BY " & GroupByClause & ") AS Temp " & _
" WHERE " & WhereClause & _
" GROUP BY " & GroupByClause
'open a recordset of the resultant SQL statementSet rs = db.OpenRecordset(strSQL)
'if a record is foundIf Not rs.EOF Then
'return the value DistinctCount = rs!countof
Else
'return -1 to indicate "no records" DistinctCount = -1
End If
On Error GoTo 0
'destroy object variables Set rs = Nothing
Set db = Nothing
Exit Function
DistinctCount_Error:
MsgBox "Error " & Err.Number & " (" & Err.Description & _
") in procedure DistinctCount of Module Module1"
End Function
Call the Function
Calling the function, from a query looks something like this in the Query Builder:
CountOfCustomer: DistinctCount("Customer","Orders","OrderDate =# " & [OrderDate] & "#","OrderDate")
In SQL view:
SELECT Orders.OrderDate, DistinctCount("Customer","Orders","OrderDate =# " & [OrderDate] & "#","OrderDate") AS CountOfCustomer, Count(Orders.OrderID) AS CountOfOrderID, Sum(Orders.Amount) AS SumOfAmount
FROM Orders
GROUP BY Orders.OrderDate
ORDER BY Orders.OrderDate;
Which results in the following:
OrderDate | CountOfCustomer | CountOfOrderID | SumOfAmount |
1/17/2008 | 3 | 9 | $7,080.85 |
1/18/2008 | 4 | 8 | $6,989.54 |
1/19/2008 | 2 | 5 | $6,554.50 |
How it works
The function runs for each row evaluated. For instance, in the above query, for 1/17/2008, the query that is created in the code looks like this:
SELECT Count(Temp.Customer) as countof FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, Customer ORDER BY OrderDate) AS Temp WHERE OrderDate =#1/17/2008# GROUP BY OrderDate
Which returns
countof |
3 |
The SQL for the next row is:
SELECT Count(Temp.Customer) as countof FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, Customer ORDER BY OrderDate) AS Temp WHERE OrderDate =#1/18/2008# GROUP BY OrderDate
Which returns
countof |
4 |
The SQL for the last row is:
SELECT Count(Temp.Customer) as countof FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, Customer ORDER BY OrderDate) AS Temp WHERE OrderDate =#1/19/2008# GROUP BY OrderDate
Which returns
countof |
2 |
As you can imagine, with a lot of customers, this might not be the best performer. But I'll leave that for next time, when I'll look at the methods and discuss the pros and cons of each.
5 comments:
You can get that Count Distinct MUCH more efficiently by using a Crosstab query:
TRANSFORM Count(*) AS Cell
SELECT Count(cell) AS DistinctCount
FROM Orders
GROUP BY "Anything"
PIVOT CustId In (null);
@Patrick
Thanks for the suggestion. That's a new one to me, and I've incorporated it into the series. However, you might be surprised by the performance comparison in Part 5.
Thank for posting the code for the user defined function. Note: the sql string has "Customer" instead of "FieldName".
Very interesting comparison, thanks !
If you still have the tables that you prepared, it would be interesting to benchmark again, but instead of calculating the time to do an OpenQuery, calculating the time of
a) opening the recordset b) do a rs.movelast
I'm still amazed that a function like this doesn't already exist in Access.
I continually run into the need for it specifically in reports.
Example:
I'm currently working on a report (whose data already comes through several different queries, including a rather large union query), where I have the individual records grouped by State (more specifically the field DepartState).
In the group header, I'd like to add a a counter for the number of individual counties that exists in the set of records displayed (specifically DepartCounty).
(I don't actually show the detail lines here, just the group header, which displays a line for each state).
I THINK the user-defined function is the way to go, but I'm not entirely sure.
Post a Comment