Monday, December 7, 2009

Top Query Revealed: Top Query Problem - Ties

Top Query Problem - Ties

As I said last time in Top Query Revealed: Aggregate Values and Top Values By Group, one problem with Top queries becomes apparent when there are duplicates in the top values. When this happens, the query will return more values than asked for.

Simple Top Queries

For instance, the following query will produce four values, rather than three because there are two orders with a total price of $27.50.

SELECT TOP 3 Account, OrderNum, [Total Price]
FROM Invoices
WHERE Account=237001
ORDER BY [Total Price] DESC;

When asked for the top three values, Access can't decide which of the two duplicates is in the top three, so it includes both of them. Figure 4 shows the result.

Figure 4: Result of a query asking for the top three values when there is a tie. Four records are returned.

The solution is to add an additional field to the Order By clause.

SELECT TOP 3 Account, OrderNum, [Total Price]
FROM Invoices
WHERE Account=237001
ORDER BY [Total Price] DESC, OrderNum DESC;

In Figure 5, you'll see that by adding OrderNum to the Order By clause, it does in fact return three values. Since I used DESC in the clause, it returned the larger of the duplicated order numbers: 542724. If I had used ASC (or left it blank) it would have returned 542723 instead.

Figure 5: Top 3 query with one of the duplicates removed.

Top Query By Group

But as you can see in Figure 6, the problem of duplicates can happen in top queries by group as well.

Figure 6: Account 237001 has four records: two with $27.50.

Unfortunately, the solution is not the same. Simply adding a second field to the Order By clause of the subquery, doesn't work; not by itself anyway. However if we add to this what we learned about using a Totals query to return aggregate values, we can find a solution.

In our previous Totals query, we used the Sum aggregate function. This time, however, we'll use a different aggregate function. The Max aggregate function returns just the maximum value of a group rather than summing the group. Adding this to our main query, we get this.

SELECT Account, Max(OrderNum) AS MaxOfOrderNum, [Total Price]
FROM Invoices
GROUP BY Account, [Total Price]
HAVING [Total Price] In
(SELECT TOP 3 [Total Price]
FROM Invoices I2
WHERE Invoices.Account = I2.Account
ORDER BY I2.[Total Price] DESC, I2.OrderNum DESC)
ORDER BY Account, [Total Price] DESC;

So adding the aggregate function to find the maximum OrderNum in the main query and adding a sort on OrderNum in the subquery solves the problem and we get output like Figure 7.

Figure 7: Solution to the duplicates problem in a top query by group.

Notice that account 237001 now has only 3 records and the number of records returned is 378 rather than 403.

A free sample illustrating the "tie" problems and solutions can be found here:

Next time, I'll talk about how use a Top Query to find Random Records.


No comments: