Thursday, June 25, 2009

Select Queries Part 3: Sorting and Grouping (ORDER BY, GROUP BY)

In the first post of this series, Select Queries Part 1: Simple Queries, I talked about creating simple queries with the first two sections of a Select Query, the SELECT clause and the FROM clause. In the second post, Select Queries Part 2: Restricting Rows - the Where Clause, I discussed how to restrict rows returned with the WHERE clause. This time, I'm going to discuss how to sort your results with the ORDER BY clause and how to aggregate your data with the GROUP BY clause.

ORDER BY Clause

Tables in relational databases like Access (or SQL Server or Oracle, for that matter) do not have any intrinsic order. This means they can be returned in any order, not necessarily the order they were entered or that they appear when you open the table. So if you want them in a particular order, you have to sort them yourself. You do that in a query in the ORDER BY clause.

The ORDER BY clause has two parts: [field] [ASC/DESC] (repeat for as many fields as you need).

The [field] designates the field on which the sort will be performed and [ASC/DESC] tells how the field will be sorted. ASC means ascending (smallest to largest). DESC means descending (largest to smallest). ASC is the default, so if you don't designate an order, it will be smallest to largest.

Letters are sorted alphabetically A-Z (or Z-A if DESC). Numbers are sorted numerically. This is all very obvious until you have numbers that are stored as text. While they look like numbers, they will not sort numerically. For instance, these character strings sorted alphabetically:

CustomerNum
-----------
101
102
1100
201
3001
301

One solution is to add leading zeros to your text numbers. Numeric data will not save leading zeros, but text will:

CustomerNum
-----------
0101
0102
0201
0301
1100
3001

Later on, I'll discuss another method that does not require altering your data.

The Order By clause follows the Where clause:

SELECT ProductName, Cost, Price, [Price]-[Cost] AS Margin
FROM Products
WHERE ProductName = "hammer"
ORDER BY Cost DESC

Examples of ORDER BY clauses:

ORDER BY ProductName (Products alphabetically A-Z)

ORDER BY [Price]-[Cost]DESC (margin highest to lowest)

ORDER BY ProductName, Cost DESC

The last example shows that you can also sort on multiple fields. In this case, the table will be sorted on ProductName (ascending) and within each group of products, it will be sorted on Cost (descending).

ProductName Cost
----------- ----
Ax           $3
Hammer       $15
Hammer       $10
Wrench       $5
Wrench       $4

The fields in the ORDER BY clause do not have to be in the Field List of the SELECT clause. For instance, this will work just fine:

SELECT ProductName
FROM Products
ORDER BY Cost

In the Query Builder, simply uncheck the checkbox to do this:


You can also sort on an expression. I showed [Price]-[Cost], but you can also apply functions to your fields. For instance, earlier I showed how text numbers will sort incorrectly. I showed how leading zeros will correct it, but another solutions is to display the text numbers as they are in the field list, but to add a function to the ORDER BY clause that converts them to numeric.

SELECT [CustomerNum]
FROM Customer
ORDER BY CLng([CustomerNum]);

CLng () is a built-in function that converts Text to Long Integer.

GROUP BY and HAVING Clauses

The GROUP BY and HAVING clauses are used with "Totals" or Aggregate queries.

The GROUP BY allows you to group your data and apply a function to the grouped data. For instance:

SELECT ProductName, Cost
FROM Products

Will result in the following:

ProductName Cost
----------- ----
Ax           $3
Wrench       $4
Hammer       $15
Wrench       $5
Hammer       $10

However, if I wanted to see the average cost for each group, I could add a GROUP BY clause:

SELECT ProductName, Cost
FROM Products
GROUP BY ProductName, Avg(Cost)

ProductName Cost
----------- ----
Ax         $3
Wrench       $ 4.5
Hammer       $
12.5

Every field in the Field List MUST be represented in the GROUP BY clause either to group on or with an aggregate function. Examples of aggregate functions are: Sum, Avg, Min, Max, and Count.

The HAVING clause works like the WHERE clause, but it restricts rows after they have been grouped.

SELECT ProductName, Cost
FROM Products
GROUP BY ProductName, Avg(Cost)
HAVING Avg(Cost)

Will return:

ProductName Cost
----------- ----
Wrench       $
4.5
Hammer       $
12.5

There's a lot more to be said about Totals Queries, and I'll do that in greater depth in a later post.

Next time, in Select Queries: Part 4, I'll finish up the Select Query with some odds and ends: Top Query, Distinct Query, Crosstab Query and Parameters.

.

Monday, June 22, 2009

New Sample: Query_SplitToMultiRowsReMergeSorted

Query_SplitToMultiRowsReMergeSorted

by AD Tejpal

Explanatory Notes

This sample db demonstrates splitting of comma (,) or star (*) separated string elements into multiple rows followed by re-merging of these elements duly sorted, back into strings. It is a virtually codeless solution, but for use of two wrapper functions related to built in Split() function.

For splitting the strings into multiple rows, a single field table (T_Ref) populated with numbers 0 to 10 is used in Cartesian join with the table holding source data. The sorted output is appended to table T_Jobs_Split, which in turn is used to generate the crosstab query followed by final select query, re-merging the sorted elements back into strings.

Note:
(a) If the maximum number of elements in one string is likely to exceed 10, additional numbers can be added to table T_Ref. At the same time, pivot clause of crosstab query and calculated field of final select query (based upon crosstab query) will need o be expanded to accommodate additional columns.

(b) Use of number reference table (as compared to master job table) in Cartesian join with source table for generating multiple rows has the advantage that it eliminates the risk of inconsistent results arising out of partial match using Like operator. Moreover, it does away with the need of preparing such a table and keeping it up-to-date, apart from ensuring that there is no inadvertent discrepancy in its contents.

Version: Access 2000 file format.
Reference: DAO 3.6

You can find the sample here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=459
.

Thursday, June 18, 2009

Select Queries Part 2: Restricting Rows - the Where Clause

In my last post, Select Queries Part 1: Simple Queries, I talked about the first two sections of a Select Query, the SELECT clause and the FROM clause. These allow you to display certain columns from designated tables. The next section is one of the most powerful features of a query: the WHERE clause.

WHERE Clause


The WHERE Clause is one of the most powerful features of a query. It allows you to restrict the rows returned based one or more criteria. These criteria are in the form of an expression, the general form of which is:


[Field] <comparison operator> [Value]

The field is a column in the base table(s) or a calculated column. It does not have to be in the Field List. The comparison operators are the standard math comparators: =, <, >, <=, >=, or <>; with some SQL specific ones added: IN/EXISTS, BETWEEN, LIKE, and IS NULL. The value portion can be either a hard coded value (like "hammer" or 25), or it can be another field or expression.

The WHERE clause follows the From Clause in a SQL Statement like so:

SELECT ProductName, Cost, Price, [Price]-[Cost] AS Margin
FROM Products
WHERE ProductName="hammer"

Examples of simple WHERE clauses:

WHERE ProductName="hammer"
(to show all hammers)

WHERE ProductName <> "hammer"
(to show all products EXCEPT hammers )

WHERE Cost <= 0.05

WHERE Price < Cost
(products sold below cost)

WHERE [Price]-[Cost]>2*[Cost]
(where the margin is greater than twice the cost)

LIKE is used with character data only and uses the asterisk as a wildcard symbol:

WHERE ProductName LIKE "ham*"
(returns "hammer" and "hammock")

WHERE ProductName LIKE "*nail"
(returns "10p nail" and "8p nail")

IN allows you to test if a field matches one of a list of values.

WHERE Cost IN (1, 2, 5, 8)

IN can also contain another SQL statement. The SQL statement contained in the IN must have only one field in the Field List.

WHERE ProductName IN (SELECT ProductName FROM Products)

I'll talk more about this in a later post when I discuss subqueries.

BETWEEN allows you to test for a range of values:

WHERE Cost BETWEEN 5 AND 10
WHERE BeginDate BETWEEN #1/1/2008# AND #12/31/2008#


IS NULL is a special comparator that tests whether or not a field is Null. The other comparison operators do not work with Null, so IS NULL is the only way to test for it.

WHERE ProductName IS NULL

Multiple Criteria and Logical Operators

You can also have multiple criteria by using multiple expressions joined by the Logical Operators: AND/OR. Examples of simple multiple criteria:

WHERE ProductName="10p nail" OR ProductName="8p nail"
WHERE ProductName="saw" AND Cost BETWEEN 5 AND 10
WHERE ProductName="saw" OR Cost BETWEEN 5 AND 10

Unfortunately, the Query Builder View of WHERE clauses (or Criteria as it's called in the QB) looks quite different than in the SQL View. In the QB, you do not need to repeat the field name with an OR or AND statement like you do in SQL. For instance, if I wanted to display both 10p nails and 8p nails, my WHERE clause would look like this:

WHERE ProductName="10p nail" OR ProductName="8p nail"

But the Query builder would look like this:

Figure 1

Of course, this would also work:

Figure 2

You have to be careful when creating multiple criteria in the Query Builder because sometimes they don't say exactly what you think they do. It matters which lines in the Criteria that you put your expressions. You create ORs on separate lines, while you create ANDs on the same line.

So this:
WHERE ProductName="saw" OR Cost BETWEEN 5 AND 10

Translates to:

Figure 3

(Notice the criteria are on separate lines.)

But this:

WHERE ProductName="saw" AND Cost BETWEEN 5 AND 10

Translates to:

Figure 4

(Notice here the criteria are on the same line.)

It gets even more complicated with multiple ANDs and Ors. For instance, what does this statement mean?

WHERE ProductName="saw" OR ProductName="hammer" AND Cost BETWEEN 5 AND 10

You might think it means I want "saws and hammers with a cost between 5 and 10". However, it doesn't. There is an Order of Precedence to the logical operators as there are with arithmetic symbols. AND always takes precedence over OR. So what will really be returned is "ALL saws and only those hammers that cost between 5 and 10". The AND expression will be evaluated before the OR expression.

So how do I get what I want? The answer is parentheses. If I want something evaluated out of precedence (like my OR expression) I surround it with parentheses. So it would be this:

WHERE (ProductName="saw" OR ProductName="hammer") AND Cost BETWEEN 5 AND 10

In the Query Builder, it would look like this:

Figure 5


One last thing about the Query Builder and WHERE clauses. The QB overdoes it with table names and parentheses. If I create the following query in the Query Builder:

Figure 6

I will get the following SQL statement:

SELECT Products.ProductName, Products.Price,
FROM Products
WHERE (((Products.ProductName)="saw"
Or (Products.ProductName)="hammer")
AND ((Products.Cost) In (1,2,5,7)));

Because there is only one table in the query, the table name preceding every field is not necessary. However, it also puts in parentheses, which, while it makes the query technically correct, actually makes it harder to read. Many of these parentheses can be removed, but not all. Naturally, the parentheses that surround the IN list must remain and also the OR statement that we know we want to take precedence. The query can be cleaned to look like this:

SELECT ProductName, Price
FROM Products
WHERE (ProductName="saw"
Or ProductName="hammer")
AND Cost In (1,2,5,7);

Because of the possibility of error when creating complex WHERE clauses in the Query Builder, I usually create them in the SQL View where I know I can control the parentheses.

NOT Operator

There is one more important Logical Operator, and that's the NOT operator. The NOT operator reverses an expression. It can be used with the LIKE, IN, and BETWEEN operators or entire multiple-expression WHERE clauses. It returns all rows EXCEPT those that would have been returned if you hadn't used the NOT.

WHERE ProductName NOT LIKE "ham*"
WHERE Cost NOT IN (1, 2, 5, 8)
WHERE Cost NOT BETWEEN 5 AND 10
WHERE Cost NOT IS NULL

It's important to remember that NOT also reverses both the comparison operators and the other logical operators, so you need to be really careful. For instance,

WHERE NOT(ProductName="saw" Or ProductName="hammer")

Is equivalent to:

WHERE (ProductName<>"saw" AND ProductName<>"hammer")


Next time, I'll look at sorting and groupin in: Select Queries Part 3: Sorting and Grouping (ORDER BY, GROUP BY).

.