Data Cleansing
Dirty data, data that is inaccurate, irrelevant and inconsistent, is highly costly to industry. It can result in lost sales and poor customer relations.
Even good product data gets dirty at nearly 30% per annum1.
Cleanliness is next to... profitability.
If your data does not do what it is supposed to, does not do the same thing each time it is used, or is the cause of avoidable and potentially costly mistakes, it’s time to consider data cleansing. After all, you invest enormous resource in building up data – why let it work against you?
Data cleaning can be split into three areas:
- Data validity
- Data consistency
- Data accuracy and completeness
Data cleansing will:
- Check all your records - whether in a single or multiple sets – and remove extraneous spaces, punctuation, symbols and code.
- Validate records against a reference list.
- Ensure product data is in the correct column (so ‘red’, for example, is the product color and not part of the product description).
- Standardize terminology and abbreviations.
- Eliminate duplication.
Data may require cleansing because the people entering information are using different definitions, have made errors when inputting data, or misunderstood what is required in each field. Data can also become corrupt during transmission. For example during data migration projects certain technologies can automatically change the data structure into an invalid or unexpected format.
Datactics recommends an initial data cleansing exercise to purge your data of inaccuracies and inconsistencies - but since data degrades at nearly 30% per annum1, regular (even real time) data cleansing is recommended.
You can elect for strict data cleansing (e.g. an address without a correct postcode will be rejected) and fuzzy data cleansing (records that match other known records within a specified tolerance will be corrected and retained in the data set).
1 PricewaterhouseCoopers

(312) 212-4363
028 9073 8854
email: 



