Entity integrity is one of the primary rules of effective database construction. It refers to the process of enforcing a primary key for each table in a database, where the key must be either a row or a combination of rows that are unique non-null values. By maintaining entity integrity, each record in each table of a database can be specifically identified via the primary key. This prevents duplicate records and other issues that would indirectly compromise the integrity of the database. In the absence of entity integrity, the entries in a database can accidentally overlap, as there will be no way to pluck a specific entry from the table.
When considering the first part of entity integrity, a “unique” value, the concept of a primary key is important to understand. A primary key in a database is a value used to specifically identify each individual record. To be effective, a primary key must be unique; in other words, it must prove impossible to retrieve multiple records by searching for a single primary key. A Social Security number would be a good primary key for a database focused on individual personal records in the US, since two people never have the same number. Anyone searching in the database for a specific Social Security number would, by definition, retrieve only a single record at a time.
To see the problems that arise in the absence of entity integrity through unique primary keys, one could picture a database where an individual’s last name is used as the primary key. Since more than one person can share a last name, entering “Smith” or “Jones” as the search criteria could possibly produce dozens of records. If this happens, the integrity of the database is in question because the end user no longer has a method to retrieve individual records.
The second part of entity integrity states that the value chosen as the primary key must never be a null, or empty, value for any item in the database. If the primary key value remained null for one or more records in the database, certain records would become impossible to retrieve, as a null value is insufficient to uniquely identify these records. In other words, if a Social Security number is used as the primary key and an employee is not from the US and thus does not have one, the foreign employee’s record will be impossible to identify using the primary key, invalidating the integrity of the database.