Guest Post by
Bob Goethe
I was reading the Wikipedia article on the MS Jet Database Engine (for which MS Access is essentially a frontend) and the writer commented that there was no essential difference between SQL Server and MS Jet in terms of the total volume of data they could store, since one could split the data across multiple MDB files.
So I sought to create a database backend containing 250 million records. I did this by generating a collection of 25 mdb files, each containing 10 million records, where each record consisted of:
- RecID
- AuthorID
- Quote
The database was set up as a collection of literary quotes. While doing the initial populating of the database, every record had a unique RecID, an AuthorID = 1 (i.e. Shakespeare), and a Quote (in the initial populating of the db, a quote from Julius Caesar).
The screen captures below will make this easier to grasp.
Screen Captures
Now, with any database where the backend is split across multiple files, you lose the chance to have the database engine itself enforcing relational integrity. That task becomes the programmer's job to manage via his code.
(The trendy name for a non-relational database with lots of records is to call it a NoSQL database. "NoRel" would be more accurate, but doesn't sound as cool.)
Access was capable of producing 5,500+ new records per second...but this still meant that creating 10 million records took a half hour. So creating 250 million records took several hours of computer time, spread across several days.
The code was designed to search through the first MDB file whose path was stored in a table. When it reached the end of the records in that file, it cascaded to the second MDB file. It was irrelevant whether that MDB file was on the same physical box as the first or not.
This structure would lend itself to building a server-cluster of inexpensive Network Attached Storage (NAS) devices running Linux, though in my case I wanted to keep the speed up by eliminating any Ethernet bottlenecks. So I kept all 25 MDB files (30 GB worth) on the hard drive of a single Win7 box.
This test was run using MS Access 2003.
Results of Test
The search of 1/4 billion records took 177 seconds to complete. This means that the Jet database was processing 1.4 million records per second, including the time it took to close its connection with one MDB file and open a connection to the next one in line.
Code that Did the Work
Private Sub cmdExecute_Click()
On Error GoTo Err_Routine
Dim wrkspc As Workspace
Dim dbs As Database, vAuthorName As String, vStart As Double, vEnd As Double, vTotalRecCount As Long
Dim rstSrc As Recordset, rst As Recordset, vAuthorID As Long, vSrc As String, rstTarget As Recordset, rstAuthor As Recordset
1 Set rstSrc = CurrentDb.OpenRecordset("DataFiles")
2 Set rstAuthor = CurrentDb.OpenRecordset("SELECT * FROM Authors WHERE Authors.RecID = " & Me!AuthorID & ";")
3 vAuthorName = rstAuthor!AuthorName
4 vAuthorID = Me!AuthorID
5 CurrentDb.Execute "Delete * from SearchResults"
6 Set rstTarget = CurrentDb.OpenRecordset("SearchResults")
7 Set wrkspc = CreateWorkspace("", "Admin", "")
8 Me.Caption = "Search start: " & Format(Now, "hh:nn:ss")
9 vStart = Timer
10 Do Until rstSrc.EOF
11 vSrc = rstSrc!BackendFile
' I have a field on the form to tell me which backend file is being processed this moment.
12 Me!Text10 = vSrc
'There are no linked tables in the frontend file; I establish a connection as it is required.
13 Set dbs = wrkspc.OpenDatabase(vSrc)
14 Set rst = dbs.OpenRecordset("SELECT * FROM Quotation WHERE (((Quotation.AuthorID)=" & vAuthorID & "));")
15 If rst.RecordCount > 0 Then
16 Do Until rst.EOF
17 rstTarget.AddNew
18 rstTarget!RecID = rst!RecID
19 rstTarget!Author = vAuthorName
20 rstTarget!Quote = rst!Quote
21 rstTarget.Update
22 rst.MoveNext
23 Loop
24 End If
25 If Not dbs Is Nothing Then dbs.Close: Set dbs = Nothing
26 rstSrc.MoveNext
27 Loop
28 vFinish = Timer
29 vduration = vFinish - vStart
30 Me.Caption = Me.Caption & " " & "End at " & Format(Now, "hh:nn:ss") & " (" & Int(vduration) & " seconds)"
31 Me.Requery
Exit_Routine:
Exit Sub
Err_Routine:
MsgBox "Error " & Err.Number & " on line " & Erl & ": " & Err.Description
Resume Exit_Routine
End Sub
Private Sub cmdCountRecs_Click()
On Error GoTo Err_Routine
Dim wrkspc As Workspace
Dim dbs As Database, vAuthorName As String, vStart As Double, vEnd As Double, vTotalRecCount As Long
Dim rstSrc As Recordset, rst As Recordset, vAuthorID As Long, vSrc As String, rstTarget As Recordset, rstAuthor As Recordset
1 Set rstSrc = CurrentDb.OpenRecordset("DataFiles")
2 Set rstTarget = CurrentDb.OpenRecordset("SearchResults")
3 Set wrkspc = CreateWorkspace("", "Admin", "")
4 Do Until rstSrc.EOF
5 vSrc = rstSrc!BackendFile
6 Me!Text10 = vSrc
7 Delay (0.25)
8 Set dbs = wrkspc.OpenDatabase(vSrc)
9 Set rst = dbs.OpenRecordset("Quotation")
10 rst.MoveLast
11 vTotalRecCount = vTotalRecCount + rst.RecordCount
12 If Not dbs Is Nothing Then dbs.Close: Set dbs = Nothing
13 rstSrc.MoveNext
14 Loop
15 Me!Text10 = Format(vTotalRecCount, "###,###,###") & " records searched."
Exit_Routine:
Exit Sub
Err_Routine:
MsgBox "Error " & Err.Number & " on line " & Erl & ": " & Err.Description
Resume Exit_Routine
End Sub
Things to Watch Out For
My first effort involved storing 15 million records per MDB file...but I found that once files grew to 1.5-ish GB each, unpredictable behavior sometimes occurred. E.g. compact the file, and suddenly RecID is no longer the primary key. Try and re-designate it as primary key, and you get a message that there is insufficient space in the file to complete the request.
So while the specs on Jet databases say that MDB files of up to 2 GB are supported, my experience is that it is better to treat ~1 GB as a good, working limit.
Next Step
I have never been able to compare the *performance* of an Access database directly with a SQL Server database, since you almost never have the SAME data set hosted on the SAME equipment, and accessed from the SAME frontend.
My plan is to set up this test on a SQL Server box, and then run the two database engines head to head.
My expectation is that SQL Server will perform better, but cost a good bit more than the Access solution. A cluster of multiple Windows Server Enterprise Edition machines running SQLS could cost well over $100K in hardware and licensing fees of various sorts.
The MS Jet solution, on the other hand, costs $130 for a retail copy of MS Access, a bank of NASes for less than $2000, and users can access the database with no need for additional licensing fees. All they need to do is download, at no charge, the Microsoft Access Runtime application.
What Does it All Mean?
Microsoft's problem - and it is a big one, from their own point of view - is that Access is simply too good, too powerful, too capable.
A good bit of Microsoft's marketing says that Access is just fine for homemakers to utilize in keeping track of their inventory of spices, and then printing out a shopping list to take to the Safeway store. But if you want to run a business, then you want a man's database. You want SQL Server.
Not every company has drunk the Cool Aide of Microsoft's SQL marketing campaign (though probably most 25 year old computer science graduates have). Enterprises the size of Alberta Covenant Health ($1B revenue/year) and even Alberta Health Services ($10B revenue/year) use MS Access databases in mission critical applications (much to the chagrin of some of their own SQL database admins).
If Access has a poor reputation among professional programmers, it is most likely because - since Access is included with MS Office - it is the database tool most often used by doctors or accountants who fancy themselves as computer savants. Of course, the applications created by these kinds of people are often total P.O.S., with no concept of relational data integrity or efficient data management. So sometimes a professional is called in to "tidy up the database" only to find that he has a real mess on his hands. In this situation, it will be pretty easy for him to come away with a negative perception of Access.
But clearly, in the hands of a competent programmer, MS Access is good for more than just organizing the spice rack in your kitchen.
They say that if your only tool is a hammer, every problem looks like a nail. There are certainly problems that *require* SQLS. But the database professional who adds MS Access to his toolkit expands his resources for efficient problem solving. The possibilities are worth pondering.
Sample files are available for a limited time. See this post for details:
Intellectual Adventure- Using MS Access to search 250 Million Records–Part 2
Bob Goethe
Bob is working for the Covenant Health Group of hospitals in Alberta, right now developing a seriously large document management-and-retrieval database. Because of some of the politics that swirl around government healthcare in Canada, Covenant's management would like to have this project built in MS Access rather than SQL Server. This sample database was built to test concepts that could be used in the operational document-management database.