Tuesday, October 7, 2008

What does NULL mean? How is it different than the Empty String?

One database concept given to much misunderstanding is the NULL value. What is it?

Null means that the value is unknown. This is different from the Empty String. The Empty String ("") means we know what the value is, and the value is, well, empty. That is, we know that there ISN'T a value.

Consider the case of the middle initial. Some people have a middle initial and some don't. Of those that do, we may or may not know what it is. If I leave it Null, it says that I don't know whether they have one or not. However, I can set the field to "" (Empty string) which says that they DO NOT have a middle initial.

Why is this difference important? Primary keys, for one thing.

Let's assume that my primary key is a composite key composed of FirstName, MiddleInitial, LastName. A primary key CANNOT have a Null as a part of it, thus we can set the value of MiddleInitial to "", and it will accept it. But we cannot leave the value Null.

Why can't a Primary Key have a Null value? Well aside from the fact that Access won't let it, the fact that Null means "I don't know the value" means that it cannot positively guarentee that the field is unique, which is one of the properties of the primary key.

The following four subs illustrate the differences:
Sub test()
If Null = Null Then
MsgBox "True"
Else
MsgBox "false"
End If
End Sub

Running this sub will ALWAYS evaluate to False. Why? Because we don't know what the value of NULL is, so we can't say if it's equal to NULL.

Sub test2()
If IsNull(Null) Then
MsgBox "True"
Else
MsgBox "false"
End If
End Sub

This evaluates to True because the IsNull() is specifically designed to test for NULL.

Sub test3()
If "" = "" Then
MsgBox "True"
Else
MsgBox "false"
End If
End Sub

This evaluates to True because the empty string is a known value.

Sub test4()
If IsNull("") Then
MsgBox "True"
Else
MsgBox "false"
End If
End Sub

False because "" is known, therefore it cannot be unknown.

As I said, NULL=NULL always evaluates to False. Since we don't know the value of Null, we can't say what it equals. This is why you never use NULL this way, but always use the IsNull() function.

Further, Len(Null) does evaluates to Null, as indeed do most things you *compare* with or *do to* Null, ie. Null=Null, X=Null, Null=True, cint(Null), etc.

Len("") ("" representing the 'Empty String' or 'Zero Length String') evaluates to 0 (zero). Because the length of a string of zero length is zero (duh!, my daughter would say).

The problem with explaining NULL, is all the good words are used for something else. You can't define it with itself, so saying "Null is Null" is null content (forgive the expression). You can't use "nothing", because that is an object pointer used to de-allocate object variables. You can't use "empty", because that is a constant specifically for use with a Variant variable. You can't use Zero either, because that represents a specific numeric value, rather than the absence of a value.

I guess we'll have to blame E.F. Codd who declared that any relational database implementation must have a special value called "Null" which represents the "absence of anything" or “unknown” condition for all datatypes.

The best we can do is to say that the value of Null is 'we don't know', which is not to say that we don't know what Null is ... I'd better stop there.

1 comment:

Anonymous said...

Nice, but it may also help to point out that this code:

Sub test1()
If Null <> Null Then
MsgBox "True"
Else
MsgBox "false"
End If
End Sub

also will always return a false value. Null=Null is always false, but Null<>Null is also always false. It's the "we don't know" rule. If we don't know the value of either of the fields in a comparison, the we don't know the result of that comparison, no matter what the other field is or the type of the comparison made.