What is Data Integrity?

Data integrity is the process of maintaining the accuracy and completeness of data over its entire life cycle, both in terms of the data itself, and also how the data is applied. Maintaining integrity requires careful planning and continuous monitoring to ensure that data remains accurate and complete.

What is data integrity?

Why is Data Integrity important?

Data integrity is important because it helps to ensure the usability and reliability of data. Over time, data sets can become corrupted due to hardware or software failures, human error, or malicious intent. 

The protection of data, and data security in particular, can deteriorate the longer the data is held, which means that measuring its integrity is safeguarding against the risk of loss, corruption or inaccuracy of the data.

What is an example of data integrity?

It’s important to remember that physical and logical integrity are two different things (although they are both vital parts). 

  • Physical integrity relates to how the data is stored, located and protected. Think of data centers subject to natural disaster, hacking or outage. While data integrity isn’t the same as data security, its physical integrity is a facet of what is understood to be data integrity as a whole (and data security itself is not sufficiently deep to involve the processes and procedures in place to maintain integrity over time).
  • Logical integrity is concerned with the structure of the data itself and how it is to be used. It’s also at risk of hacking, or human error, but rather than “did we lose the data / access to the data?” business leaders might be asking “is the data unchanged between systems for its common use?” This is distinct from data quality, though integrity is a measure that contributes to a data quality assessement. Data quality covers adjacent measurements such as duplication, consistency, timeliness and accuracy. Many of the terms in data management are overlapping and contingent on one another, and for good reason! Data management underpins an entire enterprise, rather than being the sole preserve of one department.

An example of data integrity is when a customer’s last name is misspelled in their address book entry. To maintain integrity, the customer’s last name must be corrected in both the customer’s record and in any other records that reference it, such as invoices or shipping labels.

What are the four types of data integrity?

Within logical integrity there are four types of data integrity: referential, entity, domain, and conceptual. Each type has its own rules and best practices that should be followed in order to maintain accurate and complete data sets.

  • Referential integrity, for example, is a database concept that requires every foreign key value to match a valid primary key value in another table. This helps to ensure that the changes that do occur to the data are only those which are permitted under agreed rules set by the business (see data governance).
  • Entity integrity is another database concept that requires every piece of data to have a unique identifier. These unique identifiers are known as primary keys – values which are assigned to data to ensure that each piece isn’t listed more than once, and that no field in a table is empty (or null). 
  • Domain integrity is an approach that limits the types of value that are acceptable within a column in a dataset. This could mean, for example that the values in a column are limited to integers between 1 and 10.
  • User-defined integrity is a sort of catch-all to overlay business-specific rules that might relate uniquely to the business or user’s domain. In this instance it could be applied to a specific regulatory requirement, or reflect local legislation not unilaterally present elsewhere.

What are the risks? 

Risks can be caused by human error, malicious intent, or system errors. 

  • Human error can occur when data is inputted incorrectly, data is not updated correctly, or data is deleted unintentionally. 
  • Malicious intent can occur when data is intentionally inputted incorrectly in order to cause harm or when data is deleted in order to prevent others from using it. 
  • System errors can occur when data is not backed up correctly, data is corrupted during storage or transmission, or data is accessed by unauthorized individuals. 

Data integrity risks can have serious consequences including financial loss, reputational damage, and legal liability. It is important for organizations to take steps to reduce these risks by implementing policies and procedures for data entry and updates, backing up data regularly, and encrypting data during storage and transmission

And for more from Datactics, find us on LinkedinTwitter or Facebook.

Contact Us

To speak to us about your next step on your data management journey, please get in touch.

Get ADQ 1.4 today!

With Snowflake connectivity, SQL rule wizard and the ability to bulk assign data quality breaks, ADQ 1.4 makes end-to-end data quality management even easier.