SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.
What does COUNT DISTINCT do? Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates. For instance, given the following table, how many distinct customers have orders?
ORDERS
OrderID | OrderDate | Customer | Amount |
1 | 1/17/2008 | Ajax Inc. | $310.00 |
2 | 1/17/2008 | Ajax Inc. | $510.50 |
3 | 1/17/2008 | Ajax Inc. | $311.00 |
4 | 1/17/2008 | Baker Corp. | $5,144.00 |
5 | 1/17/2008 | Baker Corp. | $61.00 |
6 | 1/17/2008 | Baker Corp. | $110.50 |
7 | 1/17/2008 | Baker Corp. | $11.00 |
8 | 1/17/2008 | Crystal & Co. | $111.85 |
9 | 1/17/2008 | Crystal & Co. | $511.00 |
10 | 1/18/2008 | Baker Corp. | $711.95 |
11 | 1/18/2008 | Baker Corp. | $810.00 |
12 | 1/18/2008 | Baker Corp. | $310.59 |
13 | 1/18/2008 | Crystal & Co. | $311.00 |
14 | 1/18/2008 | Crystal & Co. | $811.50 |
15 | 1/18/2008 | Ajax Inc. | $512.00 |
16 | 1/18/2008 | D&D LLC | $211.00 |
17 | 1/18/2008 | D&D LLC | $3,311.50 |
18 | 1/19/2008 | Ajax Inc. | $410.00 |
19 | 1/19/2008 | Ajax Inc. | $610.50 |
20 | 1/19/2008 | Baker Corp. | $4,411.00 |
21 | 1/19/2008 | Baker Corp. | $511.50 |
22 | 1/19/2008 | Baker Corp. | $611.50 |
In SQL Server, I can do this:
SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders
Which will give me the following:
CountOfCustomer |
4 |
In Access, if I use the Distinct predicate with the count:
SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;
I get:
CountOfCustomer |
22 |
Since Access SQL does not have the Count Distinct function, what can I do?
There are actually three different methods for simulating the Count Distinct:
- Subqueries in the FROM Clause
- Subqueries in the Field List (this post)
- User-Defined Function
- Crosstab Query (reader submitted method)
One problem with the FROM Clause method is its lack of flexibility, that is, it's not simple to add additional levels of grouping. That's because you're removing duplicates from the record source itself. It would be better if I could leave the record source alone and simply remove the duplicates from the count. This is exactly what the Field List method does.
Subquery in the Field List Method
With the FROM Clause method, I worked from the inside out, that is, I started with the inner queries and developed more complex queries based on the previous ones. But that won't work well with the Field List Method. With this method, I'll work from the outside in. I also can't use the Query Builder grid very well (although the final query will be viewable in it), so this will mostly be SQL.
I'll start with the query that I though should have worked but didn't:
SELECT OrdersMain.OrderDate, Count(Customer) AS CountOfCustomer
FROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;
Notice I have used an alias (OrdersMain) in the FROM clause. This will become necessary later, so I'll put it in now. The query will produce the original incorrect results:
OrderDate | CountOfCustomer |
1/17/2008 | 9 |
1/18/2008 | 8 |
1/19/2008 | 5 |
Now, what I need to do is replace
Count(Customer) AS CountOfCustomer
with a subquery that produces the correct values for each OrderDate. I've already got such a query from my last post using the FROM clause method (Problem2). I can use it here:
SELECT Count(OrdersTemp.customer),OrdersTemp.orderdate
FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,
Customer ORDER BY OrderDate) AS OrdersTemp
GROUP BY OrdersTemp.orderdate;
As before, I've aliased the FROM source, this time as OrdersTemp. This produces the following:
CountOfcustomer | orderdate |
3 | 1/17/2008 |
4 | 1/18/2008 |
2 | 1/19/2008 |
Now, I've just got to get the two queries together. To make it more obvious what's going on, I'll show the original MainQuery in Blue, the SubQuery in Red, and additional changes necessary to merge the two in Green.
SELECT OrdersMain.OrderDate,
(SELECT Count(OrdersTemp.Customer), OrdersTemp.orderdate
FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,
Customer ORDER BY OrderDate) AS OrdersTemp
GROUP BY OrdersTemp.orderdate) AS CountOfCustomerFROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;
Now, I do need to make a few more modifications. A subquery in the field list can only return a single column and a single row, that is, just one value. But as is, the subquery produces two columns and three rows. What I need to do is match up the orderdate fields in both the main query and sub query, which I do my adding a WHERE Clause. I also have to remove orderdate from the subquery field list.
So I get the following:
SELECT OrdersMain.OrderDate, (SELECT Count(OrdersTemp.Customer)
GROUP BY OrdersTemp.OrderDate) AS CountOfCustomerFROM Orders AS OrdersMainGROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;
The final query:
SELECT OrdersMain.OrderDate, (SELECT Count(OrdersTemp.Customer)
FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,
Customer ORDER BY OrderDate) AS OrdersTemp WHERE OrdersMain.OrderDate = OrdersTemp.OrderDate GROUP BY OrdersTemp.OrderDate) AS CountOfCustomerFROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;
OrderDate | CountOfCustomer |
1/17/2008 | 3 |
1/18/2008 | 4 |
1/19/2008 | 2 |
Wait a second. This is exactly what I got with the subquery itself. Why should I bother adding it into the main query?
The reason is the flexibility I talked about earlier. If I want to add a couple of additional fields, say a count of the orders and the total amount for each date, that is this:
OrderDate | CountOfCustomer | CountOfOrderID | SumOfAmount |
1/17/2008 | 3 | 9 | $7,080.85 |
1/18/2008 | 4 | 8 | $6,989.54 |
1/19/2008 | 2 | 5 | $6,554.50 |
It's as simple as adding a new field to the Main field list (in Purple)
SELECT OrdersMain.OrderDate, (SELECT Count(OrdersTemp.Customer)
FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,
Customer ORDER BY OrderDate) AS OrdersTemp WHERE OrdersMain.OrderDate = OrdersTemp.OrderDate GROUP BY OrdersTemp.OrderDate) AS CountOfCustomer, Count(OrdersMain.OrderID) AS CountOfOrderID,
Sum(OrdersMain.Amount) AS SumOfAmountFROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;
So depending on how complex your aggregate query is it may be simpler to use the Field List method. By comparison, adding each new column using the FROM clause method requires a complete re-working of the query.
This method also more accurately mimics the Count Distinct of T-SQL which also works at the field list level.
Next time, I'll look at a User-Defined Function.