Data Integrity
By Roger Stuart
Enforcing data integrity ensures the quality of data in a database. For example, if a product is entered with a Product_ID value of 25 in a table named Products, the database should not allow another product to have an ID with the same value. Furthermore, if there is a column named Product_Rating that is intended to have values ranging from 1 to 10, the database should not accept a value below 1 and above 10 for this column. This can be accomplished by using the methods supported by SQL Server to enforce the integrity of the data.
SQL Server supports a number of methods that can be used to enforce data integrity. These methods include defining datatypes, NOT NULL definitions, DEFAULT definitions, IDENTITY properties, rules, constraints, triggers, and indexes.
Datatypes
A datatype is an attribute that specifies the type of data (e.g., character, integer, binary, etc.) that can be stored in a column, parameter, or variable. SQL Server provides a set of system-supplied datatypes. However, users can also create user-defined datatypes based on the system-supplied datatypes. System-supplied datatypes define all of the types of data that can be used with SQL Server. Datatypes can be used to enforce data integrity because the data entered or modified must conform to the type specified for the object. For example, a name cannot be stored in a column defined with the datetime datatype, as a datetime column can accept only date values.
NOT NULL Definitions
The nullability of a table column determines whether the rows in the table can contain a null value for that column. A null value in a column does not mean that the column has zero, blank, or a zero-length character string such as " ". Null in a column means that no data has been entered in that column. The presence of a null in a column implies that the value is either unknown or undefined.
The nullability of a column is defined while defining the column or while creating or modifying a table. The NULL keyword is used to specify that the column will allow null values. The NOT NULL keyword specifies that null values will not be allowed in the column.
DEFAULT Definitions
Each column in a row must contain a value even if that value is null. However, certain situations exist when a row is inserted in a table, but the value for a column is not known or the value does not yet exist. If the column allows null values, a row with a null value for that column can be inserted in the table. In some cases, nullable columns might not be desirable. In these cases, a DEFAULT definition can be defined for the column. Defaults specify what values are automatically inserted in a column if a value is not specified for the column when inserting a row in the table. For example, it is common to specify zero as the default for numeric columns and N/A as the default for string columns.
When a row is inserted in a table with a default definition for a column, the SQL Server is implicitly instructed to insert the specified default value in the column if a value is not specified for the column.
IDENTITY Properties
The IDENTITY property is used to define a column as an identifier column. An identifier column contains system-generated sequential values that uniquely identify each row in the table. A table can have only one identifier column. Identifier columns usually contain values that are unique only within the table for which they have been defined. In other words, other tables containing identifier columns can contain the same identity values used by another table. However, the identifier values are typically used only within the context of a single table, and the identifier columns do not relate to other identifier columns in other tables.
Constraints
Constraints are used to define the way that SQL Server automatically enforces the integrity of a database. A constraint is a property assigned to a table or column within a table that prevents invalid data values from being entered in the specified column(s). For example, a PRIMARY KEY or UNIQUE constraint on a column prevents a duplicate value from being inserted into the column. A CHECK constraint on a column prevents the column from accepting a value that does not meet the specified condition. Moreover, a FOREIGN KEY constraint establishes a link between data in two tables.
Rules
Rules perform some of the same functions as CHECK constraints. However, CHECK constraints are preferred over rules. Rules are provided only for backward compatibility. CHECK constraints are more concise than rules. A column can have only one rule applied to it. However, multiple CHECK constraints can be applied to a column. CHECK constraints are specified while creating a table, whereas rules are created as separate objects and are bound to the column.
The CREATE RULE statement is used to create a rule. Once a rule has been created, it can be bound to a column or a user-defined data type by using the sp_bindrule system stored procedure.
Triggers
Triggers are special types of stored procedures that are defined to execute automatically when an UPDATE, INSERT, or DELETE statement is issued against a table or view. Triggers can be used to enforce business rules automatically when data is modified. Triggers can also be used to extend the integrity checking logic of constraints, defaults, and rules. However, it is recommended that constraints and defaults be used instead of triggers whenever they provide all of the needed functionality.
Indexes
An index is a database object that orders the values of one or more columns in a table. An index provides pointers to the data values stored in specified columns of the table and orders the pointers in the specified order. When rows are requested from an indexed table, the database searches the index to find a particular value and then follows the pointer to the row containing that value.
Types of Data Integrity
SQL Server supports the following four types of data integrity:
1.Entity Integrity
Entity integrity defines a row as a unique entity for a particular table. Entity integrity enforces the integrity of the identifier column(s) or the primary key of a table (through indexes, UNIQUE constraints, PRIMARY KEY constraints, or IDENTITY properties).
2.Domain Integrity
Domain integrity validates the entries for a given column. Domain integrity can be enforced by restricting the type (through data types), the format (through CHECK constraints and rules), or the range of possible values (through FOREIGN KEY and CHECK constraints, DEFAULT definitions, NOT NULL definitions, and rules).
3.Referential Integrity
Referential integrity maintains the defined relationship between tables when records are entered or deleted from the tables. In SQL Server 2000, referential integrity is based on relationships between foreign keys and primary keys or between foreign keys and unique keys (through FOREIGN KEY and CHECK constraints). Referential integrity ensures that key values are consistent across the related tables.When referential integrity is enforced, SQL Server prevents users from adding records to a related table if there is no associated record in the primary table. Users are also prevented from changing values in a primary table or deleting records from the primary table if there are related records in the related table.
4.User-Defined Integrity
User-defined integrity is used to define specific business rules that do not fall into any of the other integrity categories. All of the integrity categories support user-defined integrity. All column-level and table-level constraints defined in CREATE TABLE, stored procedures, and triggers are examples of user-defined integrity.