Data Integrity

Enforcing data integrity guarantees the quality of the data in the database. For example, if an employee is entered with an employee ID value of 123, the database should not permit another employee to have an ID with the same value. If you have an employee_rating column intended to have values ranging from 1 to 5, the database should not accept a value outside that range. If the table has a dept_id column that stores the department number for the employee, the database should permit only values that are valid for the department numbers in the company.

Two important steps in planning tables are to identify valid values for a column and to decide how to enforce the integrity of the data in the column. Data integrity falls into the following categories:

Entity integrity
Domain integrity
Referential integrity
User-defined integrity

Entity Integrity

Entity integrity defines a row as a unique entity for a particular table. Entity integrity enforces the integrity of the identifier columns or the primary key of a table, through UNIQUE indexes, UNIQUE constraints or PRIMARY KEY constraints.

Domain Integrity

Domain integrity is the validity of entries for a specific column. You can enforce domain integrity to restrict the type by using data types, restrict the format by using CHECK constraints and rules, or restrict the range of possible values by using FOREIGN KEY constraints, CHECK constraints, DEFAULT definitions, NOT NULL definitions, and rules.

Referential Integrity

Referential integrity preserves the defined relationships between tables when rows are entered or deleted. In SQL Server, referential integrity is based on relationships between foreign keys and primary keys or between foreign keys and unique keys, through FOREIGN KEY and CHECK constraints. Referential integrity makes sure that key values are consistent across tables. This kind of consistency requires that there are no references to nonexistent values and that if a key value changes, all references to it change consistently throughout the database.

When you enforce referential integrity, SQL Server prevents users from doing the following:

Adding or changing rows to a related table if there is no associated row in the primary table.
Changing values in a primary table that causes orphaned rows in a related table.
Deleting rows from a primary table if there are matching related rows.

For example, with the Sales.SalesOrderDetail and Production.Product tables in the AdventureWorks2008R2 database, referential integrity is based on the relationship between the foreign key (ProductID) in the Sales.SalesOrderDetail table and the primary key (ProductID) in the Production.Product table. This relationship makes sure that a sales order can never reference a product that does not exist in the Production.Product table.

Referential integrity using foreign/primary keys

User-Defined Integrity

User-defined integrity lets you define specific business rules that do not fall into one of the other integrity categories. All the integrity categories support user-defined integrity. This includes all column-level and table-level constraints in CREATE TABLE, stored procedures, and triggers.

What is Data Integrity? Data Protection 101

Data integrity defined, data integrity vs. data security, and more in our series on the fundamentals of data protection.

A Definition of Data Integrity

Data integrity refers to the accuracy and consistency (validity) of data over its lifecycle. Compromised data, after all, is of little use to enterprises, not to mention the dangers presented by sensitive data loss. For this reason, maintaining data integrity is a core focus of many enterprise security solutions.

Data integrity can be compromised in a number of ways. Each time data is replicated or transferred, it should remain intact and unaltered between updates. Error checking methods and validation procedures are typically relied on to ensure the integrity of data that is transferred or reproduced without the intention of alteration.

Data Integrity as a Process and as a State

The term data integrity also leads to confusion because it may refer either to a state or a process. Data integrity as a state defines a data set that is both valid and accurate. On the other hand, data integrity as a process, describes measures used to ensure validity and accuracy of a data set or all data contained in a database or other construct. For instance, error checking and validation methods may be referred to as data integrity processes.

A Case for Data Integrity

Maintaining data integrity is important for several reasons. For one, data integrity ensures recoverability and searchability, traceability (to origin), and connectivity. Protecting the validity and accuracy of data also increases stabilitity and performance while improving reusability and maintainability.

Data increasingly drives enterprise decision-making, but it must undergo a variety of changes and processes to change from raw form to more usable formats that are practical for identifying relationships and facilitating informed decisions. Therefore, data integrity is a top priority for modern enterprises.

Data integrity can be compromised in a variety of ways, making data integrity practices an essential component of effective enterprise security protocols. Data integrity may be compromised through:

Human error, whether malicious or unintentional
Transfer errors, including unintended alterations or data compromise during transfer from one device to another
Bugs, viruses/malware, hacking, and other cyber threats
Compromised hardware, such as a device or disk crash
Physical compromise to devices

As only some of these compromises may be adequately prevented through data security, the case for data backup and duplication becomes critical for ensuring data integrity. Other data integrity best practices include input validation to preclude the entering of invalid data, error detection/data validation to identify errors in data transmission, and security measures such as data loss prevention, access control, data encryption, and more.

Data Integrity vs. Data Security

Data integrity and data security are related terms, each playing an important role in the successful achievement of the other. Data security refers to the protection of data against unauthorized access or corruption and is necessary to ensure data integrity.

That said, data integrity is a desired result of data security, but the term data integrity refers only to the validity and accuracy of data rather than the act of protecting data. Data security, in other words, is one of several measures which can be employed to maintain data integrity, as unauthorized access to sensitive data can lead to corruption or modification of records and data loss. Whether it's a case of malicious intent or accidental compromise, data security plays an important role in maintaining data integrity.

For modern enterprises, data integrity is essential for the accuracy and efficiency of business processes as well as decision making. It’s also a central focus of many data security programs. Achieved through a variety of data protection methods, including backup and replication, database integrity constraints, validation processes, and other systems and protocols, data integrity is critical yet manageable for organizations today.

Data Integrity

Data Integrity

Entity Integrity

Domain Integrity

Referential Integrity

User-Defined Integrity

What is Data Integrity? Data Protection 101

A Definition of Data Integrity

Data Integrity as a Process and as a State

A Case for Data Integrity

Data Integrity vs. Data Security

Admin

0 Response to "Data Integrity"

Post a Comment