Monday, November 30, 2009

Top Query Revealed: Aggregate Values and Top Values By Group

Top Query Revealed: Aggregate Values and Top Values By Group

Last time, in Top Query Revealed: Simple Top Query, I talked about the simple top query. A simple top query returns the top values based on individual field values. I showed how this query:

SELECT TOP 10 Account, OrderNum,
[Pairs Shipped], [Total Price]
FROM Invoices
ORDER BY [Total Price] DESC;

Returned the following results:

Figure2: Results of simple Top query.

Top Query based on Aggregate Values

As useful the simple top query is, what if I wanted to find the top 5 Accounts by total sales rather than individual invoices? The solution is to use a Totals (or Aggregate) query.

SELECT TOP 5 Account,
Sum([Total Price]) AS [SumOfTotal Price]
FROM Invoices
GROUP BY Account
ORDER BY Sum([Total Price]) DESC;

Notice the addition of the Group By clause and the Sum function. The result of this query can be seen in Figure 3.

Figure 3: Result of top 5 accounts by total sales.

Another useful thing to do with a Top query is to return the top values for each of a group of data. For instance, suppose instead of the top 3 values for a particular Account, I wanted the top 3 values for all Accounts.

Top Query By Group

To do this, I need a correlated subquery. A subquery is a SELECT statement nested inside a SELECT, SELECT...INTO, INSERT...INTO, DELETE, or UPDATE statement or inside another subquery. A correlated subquery opens a separate instance of the table for each record in the main query, allowing you to compare the results of the subquery to results from the main query.

In this case, my subquery looks like this:

SELECT TOP 3 [Total Price]
FROM Invoices I2
WHERE Invoices.[Account] = I2.[Account]
ORDER BY I2.[Total Price] DESC

The key here is the From clause:

FROM Invoices I2

This creates an "alias" for the table, renaming it I2. This is important because when we embed this query within another query also based on the Invoice table, it needs to know which instance of Invoices we are asking for.

If we run the subquery as is, it will ask us for the value of Invoices.[Account] in a parameter prompt. If we give it an account number, say 391002, it will return the top 3 values for that account.

But when we embed this query in another query, instead of prompting us for the account number, it will take the account number from the each record of the main query, match it to the subquery, and return the top values for that record. The complete query looks like this:

SELECT Account, OrderNum, [Total Price]
FROM Invoices
WHERE [Total Price] In
(SELECT TOP 3 [Total Price]
FROM Invoices I2
WHERE Invoices.[Account] = I2.[Account]
ORDER BY I2.[Total Price] DESC)
ORDER BY Account, [Total Price] DESC;

And returns the values in Figure 4.

Figure 4: Shows the result of the query to show the top 3 invoices for each Account.

Notice, however that account 237001 has 4 values. Why is that? Because two records have the same value, Access cannot decide which to display, so it displays both. Duplicates can happen in both simple and group top queries, but the solution is different for each. I'll address that next in Top Query Problem: Ties.

A free sample illustrating the Top Query can be found here:



Anonymous said...

Good post and this mail helped me alot in my college assignement. Thanks you for your information.

Anonymous said...

Well I to but I about the collection should secure more info then it has.

Anonymous said...