Thursday, February 25, 2016

What Is A Join: Part 3 (Cartesian Joins)

Cartesian Joins (Cross Joins)

This is part 3 of a multi-part series on the SQL Join. Last time I talked about the most common type of join, the Equi-Join (see: What is a JOIN: Part 2 (Inner Join)). This time, I'm going to talk about the least common join type: the Cartesian or Cross Join.

Cartesian Joins are notable mostly for being avoided. A Cartesian Join matches every record in one table to every record in another. Except in very rare instances, this is something to be avoided at all costs. Since it joins all records from both tables, the resultset of a Cartesian join can quickly become huge. A Cartesian join of two tables, each holding a thousand records will result in a query displaying a million records (1000 x 1000). Cross-joining two tables with a hundred thousand records each will exceed the 2GB limit of Access.

A Cartesian join is most commonly created in the Query Builder by neglecting to add a join line between the tables.


Since Orders has 4 records and OrderDetails has 7 records, the resultset of the Cartesian join will have 28 records:

Cartesian Join

OrderNumber

OrderID

Quantity

111

1

1

222

1

1

333

1

1

444

1

1

111

2

2

222

2

2

333

2

2

444

2

2

111

3

3

222

3

3

333

3

3

444

3

3

111

1

2

222

1

2

333

1

2

444

1

2

111

1

1

222

1

1

333

1

1

444

1

1

111

3

2

222

3

2

333

3

2

444

3

2

111

3

1

222

3

1

333

3

1

444

3

1

 

As you can see, the information is fairly useless.

The corresponding SQL statement (SQL View) looks like this:

SELECT Orders.OrderNumber, OrderDetails.OrderID, OrderDetails.Quantity
FROM Orders, OrderDetails;

There is no Join clause between the tables or Join criteria, simply a comma between the tables.

But as I said, there are some rare uses for a Cartesian join.

One almost trivial use is to add a value to each record of a table. For instance, suppose I have a table called User with one record.

User

UserName

Roger

 

If I create a Cartesian join with my Orders table, I'll get:

Cartesian Join 2

OrderID

CustomerID

OrderNumber

UserName

1

1

111

Roger

2

2

222

Roger

3

3

333

Roger

4

1

444

Roger

 

Now, of course there are other ways to do this. However, the Cartesian join has the additional property of being non-updateable. If this is a desirable effect, then a Cartesian join can be useful.

But the main use of a Cartesian join is in creating joins in the Where clause, which I'll talk about in What is a Join: Part 4 (Equi-Joins in the WHERE Clause) .

[top]

Wednesday, February 17, 2016

What is a JOIN: Part 2 (Inner Join)

Equi-Join (Inner Join)

This is part 2 of a multi-part series on the SQL Join. For an introduction to the whole topic, take a look at:
What Is A Join: Part 1.

This time I'm going to talk about the Equi-Join, also called an Inner Join. An equi-join combines records from two tables which have common values in both tables and displays only those with matching records. Since an equi-join is the most common type of join, it is often simply referred to as a Join.

Consider the following Order and OrderDetail table

Orders

OrderID

OrderNumber

1

111

2

222

3

333

4

444

 

OrderDetails

OrderDetailID

OrderID

Quantity

1

1

1

2

2

2

5

3

3

6

1

2

7

1

1

8

3

2

9

3

1

Joining them on the OrderID field results in:

Single Join Query

OrderNumber

OrderID

Quantity

111

1

1

111

1

2

111

1

1

222

2

2

333

3

3

333

3

2

333

3

1


Notice that although the Orders table has a record for Order Number 444, it does not display in Single Join Query. That's because there are no matching records in OrderDetails. An equi-join shows only those records with matching values in both tables.

The SQL for this query is:

SELECT Orders.OrderNumber, OrderDetails.OrderID, OrderDetails.Quantity
FROM Orders INNER JOIN OrderDetails
ON Orders.OrderID = OrderDetails.OrderID;

In the Query Builder, it looks like this:


Notice the 1 and infinity symbol (∞) on the Join line. These indicate that a Relationship exists between these tables and that the OrderID is a Primary Key for the table on the "One-side". This is important, because if you want a query with a join of two tables to be updateable, the field or fields on the "one-side" of the join must either be a Primary Key or a Unique Index. I'll talk more about the updateability of queries in a later post.

Multiple Joins

A query is not limited to a single join. Each join is between just two tables, but the FROM clause of the query can have many joins. For instance, I can add the Products table

Products

ProductID

ProductName

Cost

Price

1

hammer

$1.00

$2.00

2

nail - 10p

$0.01

$0.02

3

saw

$5.00

$10.00

4

wrench

$6.00

$7.00

5

nail - 8p

$0.01

$0.02

6

drill

$20.00

$40.00

7

screw - 1x4

$0.02

$0.04

8

hammer

$4.00

$5.00

to the above query to get the product and price for the order.

Something like this:

MultipleJoin

OrderNumber

OrderID

Quantity

ProductName

Price

111

1

1

hammer

$2.00

222

2

2

nail - 10p

$0.02

333

3

3

saw

$10.00

111

1

2

wrench

$7.00

111

1

1

saw

$10.00

333

3

2

hammer

$2.00

333

3

1

saw

$10.00

In the Query Builder, it looks like this:


The SQL is:

SELECT Orders.OrderNumber, OrderDetails.OrderID, OrderDetails.Quantity, Products.ProductName, Products.Price
FROM Products INNER JOIN
(Orders
INNER JOIN OrderDetails
ON Orders.OrderID = OrderDetails.OrderID)
ON Products.ProductID = OrderDetails.ProductID;

Next time, I'll discuss the least common type of join, the Cartesian or Cross Join.

 

.

Wednesday, February 10, 2016

What is a Join: Part 1 (Introduction)

What is a Join: Part 1

Introduction

This is the beginning of a multi-part series devoted to Joins in Microsoft Access.

A Join connects the records of two tables based on common information in the records. In What Is Normalization: Part II, I discussed how to organize data into smaller tables to reduce redundancy. In What is Normalization: Part III, I discussed relationships and their role in defining how tables should be put back together so all of the data is retained. The Join is a SQL (Structured Query Language) implementation of a relationship. In other words, if normalization splits tables apart, the Join is how you put them back together.

In Select Queries: Part I, I talked briefly about the SQL FROM clause, where Joins are created. The general SQL syntax for the most common type of Join (Equi-Join) is:

FROM Table1 JOIN Table2 ON Table1.JoinField = Table2.JoinField

Where JoinField is a field in each table which holds the same data. So, if I had Order and OrderDetail tables, they would both have to have an OrderID field in common to relate the records in the two tables. The FROM clause for this join would be:

FROM Order JOIN OrderDetails ON Order.OrderID = OrderDetails.OrderID

Although the Equi-Join (also known as an Inner Join) is the most common, there are many different types of joins:

  1. Equi-Join (Inner Join)
  2. Cartesian Join (Cross Join)
  3. Equi-Joins in the Where Clause
  4. Outer Join (Left Join or Right Join)
    1. Unmatched Query
    2. Full Outer Join
  5. Self-Join
  6. Theta Join

In subsequent posts, I'll be discussing each of these in turn. Next up, the Equi-Join.

.

Friday, February 5, 2016

Top Query Revealed: Parameters

Top Query Revealed: Parameters

This is the fifth and last in a series on the Top Query. In previous posts, I've discussed Simple Top Queries, Aggregate and Grouping Top Queries, Problem of Ties in a Top Query and Finding Random Records.

Create a Parameter for Top Value?

One question which often comes up with regard to Top queries is if you can supply the TOP predicate as a parameter like you can with Where criteria. Unfortunately, the answer is you can't, at least not with native SQL. You can, however, use code to modify your Top query programmatically.

To do this, we need to create a subroutine called TopParameter in a General Module. Of course, I could hard code it for a specific query, but it would be far more useful to have code that will modify any Top query that I supply it. So my code will include a QueryName argument that passes in the name of the query to be modified.

Sub TopParameter(QueryName As String)
On Error GoTo Err_TopParameter

Next, I'll declare some object and scalar variables.

Dim db As DAO.Database
Dim qdf As DAO.QueryDef
Dim strSQL As String
Dim strTopVal As String
Dim strSQLTemp As String
Dim StartPos As Integer

Then I need to open the current database and set the database object variable (db) to it. The QueryDefs method of the database object lets me open the query definition of the Top query I want to modify and set it to a querydef variable (qdf).

Set db = CurrentDb
Set qdf = db.QueryDefs(QueryName)

The SQL property of the QueryDef object reads the SQL statement of the query into a string variable so I can modify it.

strSQLTemp = qdf.SQL

Since every Top query has the TOP predicate with a space before and after it, I can find the position of the first space after the existing Top value. I need to do this because I have to remove the existing Top value and replace it with the new value.

StartPos = InStr(strSQLTemp, " TOP ") + 5

It's a good idea to test whether the query is actually a Top query or not, which I can do like this:

If StartPos = 5 Then
MsgBox "Not a Top Query"
GoTo Exit_TopParameter
End If

Now I'll prompt the user for the new Top value with an inputbox.

strTopVal = InputBox("Enter TOP value:")

Next I'll read all the text after the old Top value into a variable. To do that, find the first space after the TOP predicate like this:

strSQLTemp = Mid(strSQLTemp, InStr(StartPos, strSQLTemp, " "))

Now I've got all the pieces needed to rebuild the SQL string with the inputted Top value.

strSQL = "SELECT TOP " & strTopVal & strSQLTemp

Lastly, I'll set the SQL property of the query to the new SQL string, which will save the query with the new Top value.

qdf.SQL = strSQL

To finish it off, I'll add the Exit_TopParameter label which I used earlier to exit the routine if it is not a Top query. In addition, I'll clean up the object variables and add error trapping.

Exit_TopParameter:
db.Close
qdf.Close
Set db = Nothing
Set qdf = Nothing
Exit Sub

Err_TopParameter:
MsgBox Err.Description
Resume Exit_TopParameter
End Sub

To call the Query, I need to first call this code, then open the query, like this:

Call TopParameter("MyTopQuery")
DoCmd.OpenQuery "MyTopQuery", acNormal, acEdit

And that's it. Every time the code is run, it prompts for a new Top value, modifies, and then opens the query.

A free sample illustrating this process (and, indeed, all the information in the Top Query Series) can be found here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=233.

.

Thursday, January 28, 2016

Top Query Revealed: Top Query to find Random Records

Top Query to find Random Records

This is the fourth in a series on the Top Query. In previous posts, I've discussed Simple Top Queries, Aggregate and Grouping Top Queries, and the Problem of Ties in a Top Query.

One rather surprising use for a Top query is to return a resultset of randomly selected records. It's also surprisingly simple. All you really need to do is sort on a random number, or rather, generate a random number in the Order By clause of the query.

So how do you get a random number? Fortunately, Access has a built in function called Rnd that will supply one.

To use Rnd, you need to supply it with a seed or an initial value to generate a pseudorandom number. Any numeric field will do for a seed value, but I generally use an autonumber, primary key field. In the case of the Orders table, that would be OrderID, but OrderNum would have worked as well.

SELECT TOP 10 Account, OrderNum, [Pairs Shipped], [Total Price]
FROM Invoices
ORDER BY Rnd(OrderID);

Every time this query is run, it will return 10 different records, well, sort of. Used as is, the Rnd function will produce a different set of records each time. However, if you close the database and re-open it, it will run the same records in exactly the same sequence.

In order to get a truly random set of numbers each time, you have to add the Randomize command. To do that, you have to create your own function. In a global module, create the following function:

Public Function gRnd(FeedNum As Long) as Double
Randomize
gRnd = Rnd(FeedNum)
End Function

Then use your user-defined rounding function in your Order By clause:

SELECT TOP 10 Account, OrderNum, [Pairs Shipped], [Total Price]
FROM Invoices
ORDER BY gRnd(OrderID);

A free sample illustrating this process can be found here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=233

Next time, I'll finish up the Top Query by looking at Creating a Parameter for Top Value.

 

Tuesday, January 19, 2016

Top Query Revealed: Top Query Problem - Ties

Top Query Problem - Ties

As I said last time in Top Query Revealed: Aggregate Values and Top Values By Group, one problem with Top queries becomes apparent when there are duplicates in the top values. When this happens, the query will return more values than asked for.

Simple Top Queries

For instance, the following query will produce four values, rather than three because there are two orders with a total price of $27.50.

SELECT TOP 3 Account, OrderNum, [Total Price]
FROM Invoices
WHERE Account=237001
ORDER BY [Total Price] DESC;

When asked for the top three values, Access can't decide which of the two duplicates is in the top three, so it includes both of them. Figure 4 shows the result.

Figure 4: Result of a query asking for the top three values when there is a tie. Four records are returned.

The solution is to add an additional field to the Order By clause.

SELECT TOP 3 Account, OrderNum, [Total Price]
FROM Invoices
WHERE Account=237001
ORDER BY [Total Price] DESC, OrderNum DESC;

In Figure 5, you'll see that by adding OrderNum to the Order By clause, it does in fact return three values. Since I used DESC in the clause, it returned the larger of the duplicated order numbers: 542724. If I had used ASC (or left it blank) it would have returned 542723 instead.

Figure 5: Top 3 query with one of the duplicates removed.

Top Query By Group

But as you can see in Figure 6, the problem of duplicates can happen in top queries by group as well.

Figure 6: Account 237001 has four records: two with $27.50.

Unfortunately, the solution is not the same. Simply adding a second field to the Order By clause of the subquery, doesn't work; not by itself anyway. However if we add to this what we learned about using a Totals query to return aggregate values, we can find a solution.

In our previous Totals query, we used the Sum aggregate function. This time, however, we'll use a different aggregate function. The Max aggregate function returns just the maximum value of a group rather than summing the group. Adding this to our main query, we get this.

SELECT Account, Max(OrderNum) AS MaxOfOrderNum, [Total Price]
FROM Invoices
GROUP BY Account, [Total Price]
HAVING [Total Price] In
(SELECT TOP 3 [Total Price]
FROM Invoices I2
WHERE Invoices.Account = I2.Account
ORDER BY I2.[Total Price] DESC, I2.OrderNum DESC)
ORDER BY Account, [Total Price] DESC;

So adding the aggregate function to find the maximum OrderNum in the main query and adding a sort on OrderNum in the subquery solves the problem and we get output like Figure 7.


Figure 7: Solution to the duplicates problem in a top query by group.

Notice that account 237001 now has only 3 records and the number of records returned is 378 rather than 403.

A free sample illustrating the "tie" problems and solutions can be found here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=233

Next time, I'll talk about how use a Top Query to find Random Records.

 

.

 

Saturday, January 16, 2016

Intellectual Adventure: Using MS Access to search 250 Million Records–Part 2

Guest Post by Bob Goethe

This post is a follow up for Intellectual Adventure: Using MS Access to search 250 Million Records

Downloading the Files...from now to January 21, 2016

I have zipped all the files for the 1/4 billion-record database (comprising 1.7 GB zipped, 30 GB unzipped) and made them available for download. You have my permission to do with them as you please, including modifying, and uploading them to other locations and sharing them, without limitation.

Unless you wish to include a text file attesting to how you owe all of your success in life to my example and mentoring-from-afar, you don't need to take any particular action to attribute the files to me.

My personal ISP account has some bandwidth limitations, so rather than letting you download directly from my server, I used YouSendIt/Hightail to share the files. The good news about YouSendIt is, so far as I can tell, there are no bandwidth limitations. Every Access-L member could download a copy and there is no problem.

The bad news is that the files I upload expire a week after I upload them. This means that you can download these files without restriction for the next few days, but the files will become unavailable in the afternoon of January 21.

There are 25 backend MDB files, plus a front end. You can download them all if you choose. You can store them in any folder you wish, but you must make path notations in the table called "DataFiles" so that the frontend knows where to look for its data. The data files can be, in principle, stored in any folder either locally or on a network share. They don't have to be stored in the same location. The frontend knows where to look for each file individually, based on the DataFiles table. The frontend file is called "MS_Jet Test.mdb".

If you would like to play with the database, but don't want to download *all* the files, you can choose to download one file only, called "Files21-25+FrontEnd.zip". This will give you a database of 10,000,000 records only. Again, you will want to go into the DataFiles table in the front end and remove all references to 1.mdb up to 20.mdb if you want to do this.

Files1-4.zip

https://www.hightail.com/download/ZWJYZGVURndvQUp1a3NUQw

Files5-8.zip

https://www.hightail.com/download/ZWJYZGVURnd5UkU4RmNUQw

Files9-12.zip

https://www.hightail.com/download/ZWJYZGVURnc1UjVBSXNUQw

Files13-16.zip

https://www.hightail.com/download/ZWJYZGVURncwMEhOUjhUQw

Files17-20.zip

https://www.hightail.com/download/ZWJYZGVlYStCSWNsYzhUQw

Files21-25+FrontEnd.zip

https://www.hightail.com/download/ZWJYZGVlYStFc0pvSWNUQw

Each zip file contains a collection of further zip files. This odd structure arose as a result of my experimenting with YouSendIt, to find the largest file I could successfully send while using a free account. I mixed-and-matched a bit before I decided that packages of 4 files were a good size to upload. In any case, just keep on unzipping, and eventually you will drill down to MDB files.

If you figure out other interesting things to do with, or to, this database, do share what you learned with this list.

Thanks!

Bob Goethe