Get Adobe Flash player

Real World use of Database NULLs -- Part 1

Article: Real World use of Database NULLs -- Part 1
Database: MS SQL Server, MS access and MySql. Could also apply to Progress and Oracle.
Author: Robert Cunningham

You can search the web and find a lot of information on the subject of NULLs. In this article, I will be contradicting some of the information you may find. Contradicting some business best practice. That is because academic or theoretical doesn't apply to Real World usage. Real World -- don't use NULLs -- with one exception.

WHY? Here are a few benefits:

  • Developers and end users do not have to handle data differently. See below for issues
  • JOIN performance is better. The JOINS are natural one to one ratio
  • Updates are faster. If NULLs are used, most databases require more disk writes.


May cause some issues:

  • More database space will be used. Larger database requires more disk space.
  • Parent Tables need to have a record to show not assigned. See Part 2 for more details


WHAT IS A NULL To start, what is a NULL? Simply put, a NULL is the absents of data. Or put another way, value of nothing. If you look at a simple Name table:

create table Name ( NameID integer NOT NULL, LastName varchar(25) NOT NULL, FirstName varchar(25) NOT NULL, MiddleName varchar(25) NULL);

 

NameID LastName FirstName MiddleName
100 Anderson Fred John
101 Johnson Sally
102 Smith Matt NULL

In the above example, MiddleName is the only column that allows NULLs. From the academic prospective: a NULL should be used if the value is unknown. For example, Matt Smith's middle name is not known, so a NULL is used. Whereas Sally Johnson's doesn't have a middle name and it is set to an empty string. From the real world prospective: Don't use NULLs. NULLs can cause problems for programming. If programming of a Stored Procedure, Java or PHP code, you have to accommodate a NULL differently than any other data. Does it really matter if Matt Smith has a MiddleName? No! Set it to an empty string and don't use a NULL.

PROGRAMMING ISSUES There are many issues with NULLs when it comes to programming. Most programming languages don't like NULLs and the programmer has to handle data differently. SQL:

If NULLs are Allowed No NULLs -- NOTE The MiddleName Column does not have nulls
Return unknown middle names:
select * from Names where (MiddleName is not null OR MiddleName = '')

(Will return Sally and Matt)
select * from Names where MiddleName = ''
(will return Sally and Matt because they do not have middle names)

Looking at the above queries both return the same number of rows. With the first query, The problem OR is used. You also have to remember to use the (). With the second query, it is easier to write and you don't have to worry about the concept of a NULL.

If NULLs are Allowed No NULLs -- NOTE: The MiddleName Column does not have nulls
Select EmployeeName = FirstName + ' ' + MiddleName + ' ' + LastName from Names (The above query would return NULL for EmployeeName of Matt) To fix query to work with NULLs: Select EmployeeName = FirstName + ' ' + coalesce(MiddleName, '') + ' ' + LastName from Names Select EmployeeName = FirstName + ' ' + MiddleName + ' ' + LastName from Names
(This query works because NULLs are not allowed. It is simple to read. Simple to write)


Notes:
1) coalesce is a command that will return a first non-null value.
2) Remember Anything + nothing (a NULL) is nothing.
NO NULLs with Different Data Types If you are designing a new database, you can force each column not to have NULLs. For existing tables, you have two options:

  • The first is to alter each table and each column to not allow NULLs. This can be difficult and can require someone with Medium to Advanced level of databases experience.
  • The second option is easier to implement. This option just updates the existing data to a default not NULL value. See table below for examples for each type of data.


Example:

update Name Set MiddleName = '' where MiddleName is null


Optionally, To insure the data doesn't get NULLs again:

SQL Server: alter table Name add constraint dfMiddleName default '' for MiddleName
Data Type Not NULL value
Text Data -- varchar, text or char Empty string. tic tic ('') or a quote quote (""). Storage is zero bytes
Numeric -- integer, int, and etc Set to zero. If you really need to have a non-answer value, use a -1.
Date Set to the value will be 1900-01-01 00:00:00.000.
Currency, Rates, Percentage Currency or money type of columns should NEVER be null. A currency type of column should have a predefined DEFAULT of zero.

THE EXCEPTION The exception to the rule is when it comes to data relationship columns. For All database engines, a foreign key is a definition of data relationship between two tables. Most foreign keys are of numeric data type -- typically Integer. To set a NULL value to a zero would cause foreign keyviolation . So, how do you eliminate a need for a NULL. Easy! Insert a 0 ID record in the parent table. This is actually common for data warehouse databases. See Part 2 for details. [End of Part 1]