There are few subjects in programming that are contentious enough to cause heated discussions whenever they come up in conversations, blog posts or Stack Overflow answers. The canonical example of one of these subjects is ‘tabs vs spaces’, but I’ve found another one. It’s something that you may not have thought about before (except occasionally when crafting a SQL query), so let me introduce today’s topic: Does null equal null?
Why is this so contentious? (See http://stackoverflow.com/questions/1843451/why-does-null-null-evaluate-to-false-in-sql-server for some examples of the ferocity with which some people will argue their case.) As usual, this largely comes down to the fact that there are multiple ways to look at the question. If you think of it from a programming (i.e. pointer reference) point of view then, yes, two references of null have the same pointer value and, since most of the popular languages will fall back to pointer-equality if no custom equality is defined, null does equal null. Case closed?
However: from an application / domain design point of view, where we are thinking of abstract concepts and have no notion of computer memory etc, if
(null == null) == true
then we quickly head down a path leading to absurdities in logic. This is because null is usually used to represent domain concepts such as ‘unknown’ or ‘no data’. Really, null is not a value at all, it is the representation of a missing value. Let’s follow this through with a few examples:
Let’s imagine a system with a User entity. Every User has a Name value, and that Name itself has FirstName, MiddleName and Surname properties, but MiddleName can be null to represent the fact that this particular person does not have a middle name. Now let’s imagine two such users in our system: Andy and Jamie. Neither Andy nor Jamie has a middle name so both middle name properties are set to null, but would you say that both people have the same middle name? Maybe, maybe not; perhaps you can argue this one either way. Now that we’ve whetted our appetites for the subject, let’s explore a little further by introducing another property on our user class: address.
If one user, Sally, has no address (her address is null) and another user, Bob, also has no address (his address is also null) then we can conclude that Sally and Bob live together, because they both have the same address! Okay, now we have a problem. It’s becoming clearer that null just doesn’t mix with the notion of equality, at least not in a domain. It’s easy to come up with more absurd examples, such as web requests that use a null User property to represent unauthenticated requests (therefore: all unauthenticated requests come from the same User!) or all untagged animals at a shelter having Owner = null (therefore: all these animals belong to the same owner!).
How is this issue addressed?
Different programming languages deal with this different ways. As mentioned previously “Does null equal null?” in most programming languages returns true, but false in SQL. There is also such a thing as Three-Valued Logic (a.k.a. 3VL: https://en.wikipedia.org/wiki/Three-valued_logic) that has truth tables for operations such as equality (==), and (∧), or (∨) and not (¬) when each value has three possible states: ‘true’, ‘false’, and ‘unknown’. In 3VL null == null returns null, which makes sense: the result of comparing anything to ‘unknown’ is also unknown.
The problem is that, in order to enable pragmatic programming, we have to collapse a 3-bit down to a bit (i.e. a bool? to a bool) in order to use it in if-statements. In this case it seems reasonable to assume that null is falsey, and therefore the else-branch of an if-statement should execute.
Perhaps the best way of dealing with this issue (assuming that we were designing a new programming language from scratch) would be to simply avoid it altogether and say that any nullable type T? does not support any type of equality. Asking “Does this value equal this other value?” when either value might be null is simply illegal. For convenience one could add new operators (perhaps =? and =!) that would allow the user to pick which type of equality they would prefer every time, but using = or == would result in a compiler error.
In summary: The intuition that leads many of us to initially react with “Of course null is equal to null!” is a result of being overly literal. You might even consider it a form of wordplay in the same vein as “Nothing is greater than God. A ham sandwich is better than nothing. Therefore a ham sandwich is greater than God!”, which uses “nothing” as a value in the same way as “‘a’ is nothing. ‘b’ is nothing. Therefore ‘a’ is equal to ‘b’.”, and it is false logic.