Data Cleansing
Dirty data, data that is inaccurate, irrelevant and inconsistent, is highly costly to industry. It can result in lost sales and poor customer relations. dd
Even good product data gets dirty at nearly 30% per annum1.
Cleanliness is next to... profitability.
If your data does not do what it is supposed to, does not do the same thing each time it is used, or is the cause of avoidable and potentially costly mistakes, it’s time to consider data cleansing. After all, you invest enormous resource in building up data – why let it work against you?
Data cleaning can be split into three areas:
- Data validity
- Data consistency
- Data accuracy and completeness
Data cleansing will:
- Check all your records - whether in a single or multiple sets – and remove extraneous spaces, punctuation, symbols and code.
- Validate records against a reference list.
- Ensure product data is in the correct column (so ‘red’, for example, is the product color and not part of the product description).
- Standardize terminology and abbreviations.
- Eliminate duplication.
Data may require cleansing because users entering information are using different definitions, have made simple mistakes when inputting data, or misunderstood what is required in each field. Data can also become corrupt during transmission. For example during data migration projects certain technologies can automatically change the data structure into an invalid or unexpected format. Users can also interpret the fields differently and load data accordingly.
Datactics recommends an initial data cleansing exercise to purge your data of inaccuracies and inconsistencies - but since data degrades at nearly 30% per annum1, regular (even real time) data cleansing is recommended.
You can elect for strict data cleansing (e.g. an address without a correct postcode will be rejected) and fuzzy data cleansing (records that match other known records within a specified tolerance will be corrected and retained in the data set).
If Data Cleansing is of interest to you please look at the following Case Studies which highlight the real results achieved by organizations who overcame this challenge –
-
Increased Response is a database marketing consultancy that had over 200 million records, merged from 5 different sources. Datactics imported, enhanced and de-duplicated the consumer data finding that 53% of all records had some form of duplication or inaccuracy.
-
Snow and Rock are the leading specialist winter sports and outdoor retailer in the UK. They combine shops and catalog sales. Datactics cleansed, standardized, matched and enriched the name and address database. Records were matched and merged to provide the ‘golden record’.
-
KPMG is a leading provider of professional services, which include audit, tax, financial and risk advice. Datactics technology was deployed to effectively analyze, re-engineer and match over 14 million records for a major KPMG client.
-
John H Lunns (Jewellers) Ltd is a family business with over 50 years experience selling fine diamonds, beautiful jewellery and luxury watches. Datactics cleansed, matched and merged their 40,000 customer records effortlessly
-
The Bank of Ireland offers a broad range of services to its customers. It had a need to create a ‘single view of customer’ by combining data from many internal systems. Using Datactics they cleansed the data, removing inaccuracies and duplications and created a ‘golden record’ for each customer.
1 PricewaterhouseCoopers





