Monday, April 30, 2012

COUNT DISTINCT In Access: Part 2

Field List Method
SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.

What does COUNT DISTINCT do?  Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates.  For instance, given the following table, how many distinct customers have orders?

ORDERS
OrderID OrderDate Customer Amount
1 1/17/2008 Ajax Inc. $310.00
2 1/17/2008 Ajax Inc. $510.50
3 1/17/2008 Ajax Inc. $311.00
4 1/17/2008 Baker Corp. $5,144.00
5 1/17/2008 Baker Corp. $61.00
6 1/17/2008 Baker Corp. $110.50
7 1/17/2008 Baker Corp. $11.00
8 1/17/2008 Crystal & Co. $111.85
9 1/17/2008 Crystal & Co. $511.00
10 1/18/2008 Baker Corp. $711.95
11 1/18/2008 Baker Corp. $810.00
12 1/18/2008 Baker Corp. $310.59
13 1/18/2008 Crystal & Co. $311.00
14 1/18/2008 Crystal & Co. $811.50
15 1/18/2008 Ajax Inc. $512.00
16 1/18/2008 D&D LLC $211.00
17 1/18/2008 D&D LLC $3,311.50
18 1/19/2008 Ajax Inc. $410.00
19 1/19/2008 Ajax Inc. $610.50
20 1/19/2008 Baker Corp. $4,411.00
21 1/19/2008 Baker Corp. $511.50
22 1/19/2008 Baker Corp. $611.50

In SQL Server, I can do this:

SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders

Which will give me the following:


CountOfCustomer
4

In Access, if I use the Distinct predicate with the count:
SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;

I get:

CountOfCustomer
22

Since Access SQL does not have the Count Distinct function, what can I do?

There are actually three different methods for simulating the Count Distinct:
  1. Subqueries in the FROM Clause
  2. Subqueries in the Field List (this post)
  3. User-Defined Function
  4. Crosstab Query (reader submitted method)
Each of these methods have advantages and disadvantages, and I'll address each in turn. This time, I'll look at a subquery in the field list.

One problem with the FROM Clause method is its lack of flexibility, that is, it's not simple to add additional levels of grouping.  That's because you're removing duplicates from the record source itself.  It would be better if I could leave the record source alone and simply remove the duplicates from the count. This is exactly what the Field List method does.

Subquery in the Field List Method

With the FROM Clause method, I worked from the inside out, that is, I started with the inner queries and developed more complex queries based on the previous ones.  But that won't work well with the Field List Method.  With this method, I'll work from the outside in.  I also can't use the Query Builder grid very well (although the final query will be viewable in it), so this will mostly be SQL.
I'll start with the query that I though should have worked but didn't:

SELECT OrdersMain.OrderDate, Count(Customer) AS CountOfCustomer
FROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;


Notice I have used an alias (OrdersMain) in the FROM clause.  This will become necessary later, so I'll put it in now.  The query will produce the original incorrect results:

OrderDate CountOfCustomer
1/17/2008 9
1/18/2008 8
1/19/2008 5

Now, what I need to do is replace

Count(Customer) AS CountOfCustomer

with a subquery that produces the correct values for each OrderDate.  I've already got such a query from my last post using the FROM clause method (Problem2).  I can use it here:

SELECT Count(OrdersTemp.customer),OrdersTemp.orderdate
FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,
      Customer ORDER BY OrderDate)  AS OrdersTemp
GROUP BY OrdersTemp.orderdate;


As before, I've aliased the FROM source, this time as OrdersTemp.  This produces the following:

CountOfcustomer orderdate
3 1/17/2008
4 1/18/2008
2 1/19/2008

Now, I've just got to get the two queries together.  To make it more obvious what's going on, I'll show the original MainQuery in Blue, the SubQuery in Red, and additional changes necessary to merge the two in Green

SELECT OrdersMain.OrderDate,
(
SELECT Count(OrdersTemp.Customer), OrdersTemp.orderdate
      FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,
            Customer ORDER BY OrderDate)  AS OrdersTemp
      GROUP BY OrdersTemp.orderdate
) AS CountOfCustomerFROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;


Now, I do need to make a few more modifications.  A subquery in the field list can only return a single column and a single row, that is, just one value.  But as is, the subquery produces two columns and three rows.  What I need to do is match up the orderdate fields in both the main query and sub query, which I do my adding a WHERE Clause.  I also have to remove orderdate from the subquery field list.

So I get the following:

SELECT OrdersMain.OrderDate, (SELECT Count(OrdersTemp.Customer), OrdersTemp.orderdate
       FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate,               Customer ORDER BY OrderDate)  AS OrdersTemp       WHERE OrdersMain.OrderDate = OrdersTemp.OrderDate
       GROUP BY OrdersTemp.OrderDate) AS CountOfCustomer
FROM Orders AS OrdersMainGROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;


The final query:

SELECT OrdersMain.OrderDate, (SELECT Count(OrdersTemp.Customer)
       FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, 
             Customer ORDER BY OrderDate)  AS OrdersTemp
      
WHERE OrdersMain.OrderDate = OrdersTemp.OrderDate        GROUP BY OrdersTemp.OrderDate) AS CountOfCustomerFROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;



OrderDate CountOfCustomer
1/17/2008 3
1/18/2008 4
1/19/2008 2

 
Wait a second.  This is exactly what I got with the subquery itself.  Why should I bother adding it into the main query?

The reason is the flexibility I talked about earlier.  If I want to add a couple of additional fields, say a count of the orders and the total amount for each date, that is this:

OrderDate CountOfCustomer CountOfOrderID SumOfAmount
1/17/2008 3 9 $7,080.85
1/18/2008 4 8 $6,989.54
1/19/2008 2 5 $6,554.50

It's as simple as adding a new field to the Main field list (in Purple)

SELECT OrdersMain.OrderDate,       (SELECT Count(OrdersTemp.Customer)
       FROM (SELECT OrderDate, Customer FROM Orders GROUP BY OrderDate, 
             Customer ORDER BY OrderDate)  AS OrdersTemp
      
WHERE OrdersMain.OrderDate = OrdersTemp.OrderDate        GROUP BY OrdersTemp.OrderDate) AS CountOfCustomer,       Count(OrdersMain.OrderID) AS CountOfOrderID,
       Sum(OrdersMain.Amount) AS SumOfAmount
FROM Orders AS OrdersMain
GROUP BY OrdersMain.OrderDate
ORDER BY OrdersMain.OrderDate;


So depending on how complex your aggregate query is it may be simpler to use the Field List method.    By comparison, adding each new column using the FROM clause method requires a complete re-working of the query.

This method also more accurately mimics the Count Distinct of T-SQL which also works at the field list level.

Next time, I'll look at a User-Defined Function.


COUNT DISTINCT in Access: Part 1

From Clause Method

SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.

What does COUNT DISTINCT do?  Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates.  For instance, given the following table, how many distinct customers have orders?

ORDERS
OrderID OrderDate Customer Amount
1 1/17/2008 Ajax Inc. $310.00
2 1/17/2008 Ajax Inc. $510.50
3 1/17/2008 Ajax Inc. $311.00
4 1/17/2008 Baker Corp. $5,144.00
5 1/17/2008 Baker Corp. $61.00
6 1/17/2008 Baker Corp. $110.50
7 1/17/2008 Baker Corp. $11.00
8 1/17/2008 Crystal & Co. $111.85
9 1/17/2008 Crystal & Co. $511.00
10 1/18/2008 Baker Corp. $711.95
11 1/18/2008 Baker Corp. $810.00
12 1/18/2008 Baker Corp. $310.59
13 1/18/2008 Crystal & Co. $311.00
14 1/18/2008 Crystal & Co. $811.50
15 1/18/2008 Ajax Inc. $512.00
16 1/18/2008 D&D LLC $211.00
17 1/18/2008 D&D LLC $3,311.50
18 1/19/2008 Ajax Inc. $410.00
19 1/19/2008 Ajax Inc. $610.50
20 1/19/2008 Baker Corp. $4,411.00
21 1/19/2008 Baker Corp. $511.50
22 1/19/2008 Baker Corp. $611.50

In SQL Server, I can do this:

SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders
Which will give me the following:

CountOfCustomer
4

In Access, if I use the Distinct predicate with the count:

SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;

I get:

CountOfCustomer
22

Since Access SQL does not have the Count Distinct function, what can I do? 
There are actually four different methods for simulating the Count Distinct:
  1. Subqueries in the FROM Clause (this post)
  2. Subqueries in the Field List
  3. User-Defined Function
  4. Crosstab Query (reader submitted method)
Each of these methods have advantages and disadvantages, and I'll address each in turn.  This time I'll start with using a subquery in the FROM clause

Subquery in the FROM Clause Method

Using a subquery in the FROM cause restricts the pool of values for aggregating in the main query.  The main advantage is that it's fairly easy to figure out.  The main disadvantage is lack of flexibility. 

Note: I have a couple of choices for using a subqueries.  They can be used either in-line or stacked.  What's the difference?  In the stacked query method, the individual subsqueries are saved as named queries and then referenced by name in later queries.  In the in-line method, they are all combined into a single SQL statement

I find it useful to create use the stacked query method to develop and combine them into a single statement once I've got it working correctly.  It also gives me a method of checking my results.

Problem 1:

So, back to my original problem.  First, I create a simple aggregate query, grouping the customers:

SELECT Customer FROM Orders GROUP BY Customer;

Result:

SimpleStep1 (saved query)

Customer
Ajax Inc.
Baker Corp.
Crystal & Co.
D&D LLC

I'll save that as SimpleStep1.  Then I'll use SimpleStep1 in the FROM clause for a second query:

SELECT Count(Customer) AS CountOfCustomer FROM SimpleStep1;

SimpleStep2

CountOfCustomer
4

Combining these into a single query:

SELECT Count(Step1.Customer) AS CountOfCustomer
FROM (SELECT Customer FROM Orders GROUP BY Customer) AS Step1;


SimpleCombined

CountOfCustomer
4

Note: the alias "Step1" could be anything at all, even "A".  I used "Step1" to illustrate how the queries are related.

More Complex Queries: Problem 2

This is all well and good, but, unfortunately, this is only useful in limited situations.  If I wanted a more complex query, say, how many customers per day:

OrderDate CountOfCustomer
1/17/2008 3
1/18/2008 4
1/19/2008 2

it is much more difficult. 

SQL Server

With COUNT DISTINCT, in SQL Server, I can just add an additional field to the Field List and Group By clauses.

SELECT OrderDate, COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders
GROUP BY OrderDate;


Access

If I try to do the same thing in Access,
SELECT OrderDate, Count(Step1.Customer) AS CountOfCustomer
FROM (SELECT Customer FROM Orders GROUP BY Customer)  AS Step1
GROUP BY OrderDate;


I get a parameter box asking for OrderDate:




In retrospect the reason for this if fairly obvious.  I'm using a GROUP BY in the subquery in the FROM clause to limit the values available and OrderDate isn't in the field list.

So to fix it, I have to add OrderDate to both the subquery and the main query:

SELECT Step1.OrderDate, Count(Step1.Customer) AS CountOfCustomer
FROM (SELECT OrderDate, Customer
      FROM Orders GROUP BY OrderDate, Customer)  AS Step1
GROUP BY Step1.OrderDate;


And I get the same result:

OrderDate CountOfCustomer
1/17/2008 3
1/18/2008 4
1/19/2008 2

Even More Complex Queries: Problem 3

But suppose I want an even more complex query.  Suppose I want to see a distinct count of the Customers AND a count of the number of orders for each date?

In SQL Server, I can simply add my additional aggregrate function (CountOfOrders) to the Field List:

SELECT OrderDate, COUNT(DISTINCT Customer) AS CountOfCustomer
       COUNT(OrderID) AS CountOfOrders
FROM Orders
GROUP BY OrderDate;


In Access, if I try to do the same thing:

SELECT Step1.OrderDate, Count(Step1.Customer) AS CountOfCustomer, 
      COUNT(OrderID) AS CountOfOrders
FROM (SELECT OrderDate, Customer
      FROM Orders GROUP BY OrderDate, Customer)  AS Step1
GROUP BY Step1.OrderDate;


once again, I'll get the parameter box but this time asking for the OrderID.



The reason is the same because OrderID is not in the record source.  So I'm going to have to modify the record source to add it.  I'll start with the stacked query method and then combine them.

StackedStep1

SELECT Orders.OrderDate, Orders.Customer
FROM Orders
GROUP BY Orders.OrderDate, Orders.Customer;


StackedStep2

SELECT OrderDate, Count(Customer) AS CountOfCustomer
FROM StackedStep1
GROUP BY OrderDate;


StackedStep3

SELECT Orders.OrderDate, Stackedstep2.CountOfCustomer,
           Count(Orders.OrderID) AS CountOfOrderID
FROM Stackedstep2 INNER JOIN Orders ON Stackedstep2.OrderDate = Orders.OrderDate
GROUP BY Orders.OrderDate, Stackedstep2.CountOfCustomer;


Which gives me the following:

OrderDate CountOfCustomer CountOfOrderID
1/17/2008 3 9
1/18/2008 4 8
1/19/2008 2 5

Combining them into a single query:

SELECT Orders.OrderDate, step2.CountOfCustomer,
        Count(Orders.OrderID) AS CountOfOrderID
FROM (SELECT OrderDate, Count(Customer) AS CountOfCustomer
      FROM (SELECT OrderDate, Customer FROM Orders
            GROUP BY OrderDate, Customer ORDER BY OrderDate)  AS step1
      GROUP BY OrderDate)  AS step2
      INNER JOIN Orders ON step2.OrderDate = Orders.OrderDate
GROUP BY Orders.OrderDate, step2.CountOfCustomer;


Fun, huh?

I think you can see that while modifying the FROM clause allows me to get the right answer, it's a lot of work and not very flexible, that is, it's not simple to add additional levels of grouping.  That's because I'm removing duplicates from the record source itself.  It would be better if I could leave the record source alone and simply remove the duplicates from the count. This is exactly what the Field List method does.

Next time, I'll look at doing the COUNT DISTINCT in the Field list, and we'll see if it's any better.



Wednesday, April 18, 2012

New Sample: Form_PivotTableSimulated

Form_PivotTableSimulated

by AD Tejpal

    This sample db demonstrates a simulated pivot table, displayed on access subform in datasheet view. It is based purely upon native access queries and VBA, getting rid of traditional dependence upon OWC i.e. Office Web Components (slated for deprecation).

    2 - Incorporation of features typical of excel pivot table:
    --------------------------------------------------------
    An attempt has been made to incorporate basic features considered typical of an excel pivot table as listed below:
    2.1 - Being a datasheet, column headings remain in perpetual view, affording convenient vertical scrolling.
    2.2 - Suitable number of grouping columns at left are frozen so as to facilitate convenient horizontal scrolling.
    2.3 - Uncluttered display (suppressing duplicate contents in  grouping columns).
    2.4 - Full choice of filter fields (equivalent to page fields in excel)
    2.5 - Super-imposition of a layer of cross tab columns in parental style, if a field is earmarked as column field.
    2.6 - Toggling of data orientation between column and row styles as per user choice.
    2.7 - Manipulation of ordinal position of columns - as desired by user - by clicking up or down command buttons in field list.
    2.8 - Run time insertion of unlimited number of calculated field columns - as desired by user. Calculated field names and the underlying expressions can be edited as and when desired. The results of any such action get displayed promptly in the pivot table.
    2.9 - Run time setting / editing of formatting for different data fields - as desired by user. The results of any such action get displayed promptly in the pivot table.
  2.10 - Run time hiding / un-hiding of grouping and data fields as desired by user.
  2.11 - Fields List and Filter Check-List can be hidden un-hidden via command button. The filter list gets displayed only if IsFilter check box against one or more fields is in selected state.
  2.12 - If desired by the user, filter field columns get displayed in the pivot table as well.

    3 - Some Interesting Extra Enhancements:
    ------------------------------------------
    While building the simulated pivot table certain interesting enhancements have been incorporated as follows, so as to make it still more versatile and user friendly:

    3.1 - Generic Data Source:
    ---------------------------
    The arrangement for depicting the pivot table is completely generic. The developer is not required to drag any fields on to the form. All that needs to be done is identification of data source (which can be the name of a table or saved query). This is done conveniently via look-up list on a combo box. On selection of new data source, corresponding pivot table gets displayed via unbound controls on the subform. Adequate number of such controls have been provided so as to suit up to 300 columns.

    3.2 - Step by step auto-expand and auto-collapse:
    ---------------------------------------------------
    Step by step auto-expand or auto-collapse of pivot table can be carried out by clicking pertinent command button.

    3.3 - Flexible Run Time Grouping By Date Type Fields:
    --------------------------------------------------------
    Source data originally entered as simple dates can be depicted in various grouping styles, e.g. ByYear, ByQuarter, ByMonth etc. The user can play with date grouping style, selectable via combo box at bottom right. The results of any such action get displayed promptly in the pivot table.

    3.4 - Color Highlights For Pivot Table Rows Depicting Sub-totals:
    ------------------------------------------------------------------
    For added convenience, sub-total rows in pivot table are highlighted in color. Wherever more than one such rows happen to be adjacent, alternate color highlights are provided - for improved legibility.

    3.5 - Color Highlights In Fields And Filter List:
    -----------------------------------------------
    In fields list, distinct color highlights are provided for fields identified for hiding, filtering or to serve as column field. In filter check list, excluded items are highlighted.

    3.6 - Reminder Content In Outer Columns:
    ----------------------------------------------------
    During vertical scrolling in traditional pivot tables, the topic displayed in outer-most column at left can go out of view, depending upon the range of contents in grouped columns at right. For better convenience, pivot table in this sample db provides for repeat display of outer column content, whenever their is change of content in last but one grouped column.

Version: Access 2000 file format.

You can find the sample here:  http://www.rogersaccesslibrary.com/forum/topic582.html