What is Data Profiling?
Data profiling is the process of reviewing data, including its source, to provide helpful summaries of information about the data, including potential data quality issues.
Data profiling is an essential component in maintaining data quality. Through running a diagnosis of the data, including its sources and metadata, errors and inconsistencies can be picked up on and amended before becoming actionable intelligence.
Data profiling can be broken down into three types-
Structure Discovery: Examining the data to ensure consistent formatting.
Content Discovery: A closer look at the contents of the data, checking for gaps or anomalies in the data itself.
Relationship Discovery– Investigating how the data relates to other datasets.
Through a process of data profiling, organisations can prevent costly data quality errors, which Gartner describes as duplication, lack of consistency, accuracy and completeness. Moreover, it feeds into the process of data migration, as data needs to be cleansed and profiled before it’s moved into a new location.
How can Datactics help with data profiling?
Profiling is a core component of Datactics’ Self-Service Data Quality platform. Every project, programme or tactical exercise starts with data profiling to help shed light on exactly what data our clients are dealing with. As many data leaders confess to not knowing what’s in their data, or indeed where it is, profiling is an essential step for us to support our customers in delivering end-to-end perfected data across the enterprise.