Uncovering the Root Causes of Data Quality Issues

We all know data quality issues when we see them. They can often impair the ability of an organization to work efficiently and comply with regulations, plus it makes it harder to generate any real business value from messy data.

Rather than simply just measuring and patching up issues, we help our clients understand why issues are surfacing by identifying the root cause and fixing it at the source.

To some, this concept may seem like a pipedream. But many of our clients are recognizing the true value that this brings.

Recently we have been exploring the industry-standard opinion on this, with Kieran Seaward taking to the stage at FIMA US in Boston earlier this year to host two roundtables on the topic: “Uncovering the Root Causes of Data Quality Issues and Moving Beyond Measurement to Action”.

During these roundtable discussions, data management practitioners from diverse backgrounds and industries (and with considerable experience in the field) shared their insights on dealing with poor data quality. Participants had the opportunity to learn from each other’s experiences and explore the actions they have taken to address this challenge.

We were grateful for such candid and open conversation around what is a challenging topic. We wanted to share some of the key themes and insights that resonated with us during the sessions to help you get started:

1. Proactive (not reactive) Data Quality Management

Historically, data quality management has been viewed as a reactive measure to fixing bad data that has negatively impacted a report or a decision. Now, with the advancement in capabilities and technology, firms should look to become proactive and try to prevent issues from occurring in the first place- this will help restrict downstream impact on critical data elements.

In other words, prevent the fire from starting rather than stopping the spread.

But how can you achieve this? There are a number of key steps.

Firstly, define data quality metrics and establish baseline measurements by setting targets for each metric and implementing a monitoring process.
Then, you can conduct regular assessments to measure progress and develop improvement processes.
Finally, implementing reporting and visualization mechanisms to communicate data quality measurement is important for highlighting the impact (and ROI) to business teams and senior leadership – this can be continuously iterated and refined, as necessary.

2. Automation of manual processes

Automation plays a vital role in modernizing approaches to data quality management. Gone are the days when data scientists must spend 80% of their time wrangling with data to ensure it is fit for purpose. By using advanced techniques such as artificial intelligence, machine learning, and statistical modeling, practitioners can reduce the manual effort of boring, repetitive tasks and become more proactive in how they manage data quality.

Some technologies in the market offer automated profiling and recommended data quality rules for validation, cleansing, and deduplicating based on the column headers (metadata) as well as the underlying values. These tasks are often performed by writing complicated programming scripts, are unscalable, and can take considerable time. By automating this process, technical resources can be reallocated to more value-adding activities.

3. Root cause analysis of Data Quality issues

With an effective data quality measurement and monitoring process in place – which is by no means a trivial exercise to implement – you can start to identify trends of data quality breaks and act upon them.

As a reference point, it’s helpful to consider the  Five Ws:

What Data Quality break has occurred?

Where  has the Data Quality issue occurred or surfaced? Has the DQ issue occurred at multiple points in the journey or propagated through other systems?

When  is the break occurring?

Who  is responsible for this element of information? Who is the data steward or data owner?

Why  is it occurring? Hopefully, the previous four questions have shed some light on the reasons for the issue.

If you can accurately know the answer to each of these, you are in a good position to resolve, or fix, that data quality issue.

AI can also help users to continuously monitor data quality breaks over time. By doing so, you can generate a rich set of statistics that enables analysis of data quality breaks and identify relationships between issues. This helps users predict future breaks, predict break resolution times, and understand the downstream impact of breaks.

4. Remediation

Remediation is uniquely important in the data management process because it does something about the problems being reported. With a comprehensive understanding of where and why breaks are occurring, you have the opportunity to put out that fire and fix your broken data.

Some people are understandably hesitant about fixing data, but without acting, the rest of the process remains passive.

We do not believe in handing off the responsibility to another team or system – but instead taking action to deal with and fix the breaks that have surfaced.

We are currently working with a customer to fix those breaks at source, using a federated approach to solving data quality issues in the business by utilizing SME knowledge of what good looks like.

This part of the story, where you are doing something proactive about making the data better, is the element that is often missing from solutions or processes that spend all their time noticing breaks or passively monitoring systems.

Our recent engagement with industry experts at FIMA US in Boston reinforced the significance of proactive data quality management. With advancements in capabilities and technology, firms can now take a proactive approach. By defining data quality metrics, automating manual processes, conducting root cause analysis, and implementing remediation strategies, businesses can enhance the quality of their data and maximize its impact.

We believe that taking ownership of data quality and embracing a proactive approach is the key to harnessing the full potential of your data for business success. In a world where data is a critical asset, it’s critical to move beyond merely noticing data quality breaks and to start actively working towards making data better.

Kieran will be discussing data remediation at the Data Management Summit in New York on 28th September, speaking on a panel entitled ‘How to scale enterprise data quality with AI and ML’. You can also contact us to learn more here.

Uncovering the Root Causes of Data Quality Issues

1. Proactive (not reactive) Data Quality Management

2. Automation of manual processes

3. Root cause analysis of Data Quality issues

4. Remediation

More on this topic

Press Release: Digital Data Validation Sandbox

About Datactics

Downloads

Research