Thursday, May 24, 2012

COUNT DISTINCT In Access: Part 3

User-Defined Function

SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.

What does COUNT DISTINCT do?  Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates.  For instance, given the following table, how many distinct customers have orders?

ORDERS

OrderID

OrderDate

Customer

Amount

1

1/17/2008

Ajax Inc.

$310.00

2

1/17/2008

Ajax Inc.

$510.50

3

1/17/2008

Ajax Inc.

$311.00

4

1/17/2008

Baker Corp.

$5,144.00

5

1/17/2008

Baker Corp.

$61.00

6

1/17/2008

Baker Corp.

$110.50

7

1/17/2008

Baker Corp.

$11.00

8

1/17/2008

Crystal & Co.

$111.85

9

1/17/2008

Crystal & Co.

$511.00

10

1/18/2008

Baker Corp.

$711.95

11

1/18/2008

Baker Corp.

$810.00

12

1/18/2008

Baker Corp.

$310.59

13

1/18/2008

Crystal & Co.

$311.00

14

1/18/2008

Crystal & Co.

$811.50

15

1/18/2008

Ajax Inc.

$512.00

16

1/18/2008

D&D LLC

$211.00

17

1/18/2008

D&D LLC

$3,311.50

18

1/19/2008

Ajax Inc.

$410.00

19

1/19/2008

Ajax Inc.

$610.50

20

1/19/2008

Baker Corp.

$4,411.00

21

1/19/2008

Baker Corp.

$511.50

22

1/19/2008

Baker Corp.

$611.50

In SQL Server, I can do this:

SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders

Which will give me the following:

CountOfCustomer

4

In Access, if I use the Distinct predicate with the count:

SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;

I get:

CountOfCustomer

22

Since Access SQL does not have the Count Distinct function, what can I do? 

There are actually three different methods for simulating the Count Distinct:

  1. Subqueries in the FROM Clause
  2. Subqueries in the Field List
  3. User-Defined Function (this post)
  4. Crosstab Query (reader submitted method)

Each of these methods have advantages and disadvantages, and I'll address each in turn.  This time I'll end with a User-Defined Function.

User-Defined Function

A user-defined function is simply a piece of VBA code that you can call from other code or from a query.  In this case, I'll call it from a query.

This process works pretty much like the built-in Domain functions like DCount, DMax, DLookup, etc.  If you're unfamiliar with Domain Functions, you can read more in my post: Domain Functions Demystified.

Into the function, I send the name of a field, a table (or query), and a Where clause (without the WHERE keyword),  In addition, I also need to send the Group By clause (again without the GROUP BY keyword).

Then I take the query created in Subqueries in the Field List and build it in code with the addition of a WHERE clause that restricts the query to a single row matching the row being evaluated in the query.  The evaluated query returns a single value.  

Lastly, I open a recordset and return the value from the "countof" column from the function.

First of all, create the following Function in a General Module:

Function DistinctCount(FieldName As String, _
                        TableName As String, _
                        WhereClause As String, _
                        GroupByClause As String) As Long
'declare variables
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim strSQL As String

On Error GoTo DistinctCount_Error

'set database variable to current database
Set db = CurrentDb

'create SQL string with function argument values
strSQL = "SELECT Count(Temp." & FieldName & ") as countof " & _
       " FROM (SELECT " & GroupByClause & ", Customer FROM " & _
       TableName & " GROUP BY " & GroupByClause & ", " & _
       FieldName & " ORDER BY " & GroupByClause & ")  AS Temp " & _
       " WHERE " & WhereClause & _
       " GROUP BY " & GroupByClause
      
'open a recordset of the resultant SQL statement
Set rs = db.OpenRecordset(strSQL)

'if a record is found
If Not rs.EOF Then
    'return the value
    DistinctCount = rs!countof
Else
    'return -1 to indicate "no records"
    DistinctCount = -1
End If

On Error GoTo 0
    'destroy object variables
    Set rs = Nothing
    Set db = Nothing

    Exit Function

DistinctCount_Error:

    MsgBox "Error " & Err.Number & " (" & Err.Description & _
    ") in procedure DistinctCount of Module Module1"

End Function

Call the Function

Calling the function, from a query looks something like this in the Query Builder:

CountOfCustomer: DistinctCount("Customer","Orders","OrderDate =# " & [OrderDate] & "#","OrderDate")

image

In SQL view:

SELECT Orders.OrderDate, DistinctCount("Customer","Orders","OrderDate =# " & [OrderDate] & "#","OrderDate") AS CountOfCustomer, Count(Orders.OrderID) AS CountOfOrderID, Sum(Orders.Amount) AS SumOfAmount
FROM Orders
GROUP BY Orders.OrderDate
ORDER BY Orders.OrderDate;

Which results in the following:

OrderDate

CountOfCustomer

CountOfOrderID

SumOfAmount

1/17/2008

3

9

$7,080.85

1/18/2008

4

8

$6,989.54

1/19/2008

2

5

$6,554.50

How it works

The function runs for each row evaluated. For instance, in the above query, for 1/17/2008, the query that is created in the code looks like this:

SELECT Count(Temp.Customer) as countof  FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, Customer ORDER BY OrderDate)  AS Temp  WHERE OrderDate =#1/17/2008# GROUP BY OrderDate

Which returns

countof

3

The SQL for the next row is:

SELECT Count(Temp.Customer) as countof  FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, Customer ORDER BY OrderDate)  AS Temp  WHERE OrderDate =#1/18/2008# GROUP BY OrderDate

Which returns

countof

4

The SQL for the last row is:

SELECT Count(Temp.Customer) as countof  FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, Customer ORDER BY OrderDate)  AS Temp  WHERE OrderDate =#1/19/2008# GROUP BY OrderDate

Which returns

countof

2

As you can imagine, with a lot of customers, this might not be the best performer.  But I'll leave that for next time, when I'll look at the methods and discuss the pros and cons of each.

Friday, May 18, 2012

Speaking at Rocky Mountain Trifecta May, 19, 2012

I'm going to be giving two presentations at the Rocky Mountain Trifecta in Denver, tomorrow, May 19, 2012.  Topics will be: