Automated cleansing and deduplication for data with mixed character sets


The Datactics solution was selected for use in Hong Kong Trade Development Council to replace a labour intensive manual process of maintaining data.

Datactics was chosen in part because of its unique ability to handle all variations of Chinese characters within the same data set, and also because of its high level of reporting and tracking and reviewing changes made to the data, ensuring low risk. The solution has reduced the time taken in making data available from months to days.

The Client

The Trade Development Council organises 30 world-class international trade fairs annually, attracting some 500,000 visitors. It maintains a databank of nearly 9 million records collected from various countries. The data is used for general marketing purposes and to support specific business matching activities.  It is critical that the information captured in the databank be up-to-date, and maintained to a high quality standard


The major source of company information is from the 30 world-class international trade fairs organised by the Trade Development Council annually.Speedy and accurate capturing of visitors’ registration information from trade fairs and data from other sources is critical to make available the company information collected for marketing and business matching use.

HKTDC recognized that by improving the data quality processes they could add revenue to their bottom line. By equipping their data analysts with a tool to automate the cleansing and deduplication process they could save time and resources and provide a better service to their clients.

We needed a solution that would yield immediate benefits in accelerating the accessibility of accurate data. The team at Datactics was not fazed by handling different character sets and their solution will significantly reduce the time we need to make essential information available very quickly.

- , Trade Development Council


Datactics provided a highly configurable deduplication solution with sophisticated logic to automate the whole data cleansing process, providing accurate data quickly and cost effectively.
Datactics effectively handled mixed data sets including Chinese and other Asian languages providing the HKTDC with the ability to track the changes to the data and manually review lower confidence matches by way of the Datactics DQM.
The solution has reduced the time taken in making accurate data available from months to days.


  • Provides a user friendly web interface with good response times to speed up the deduplication process where human intervention is unavoidable.
  • Provides improved data security.
  • Reduces the turnaround time needed to deduplicate new company data from external sources from weeks to days.
  • Automates the importing of new company data into the database, releasing existing skilled resources for higher value work.
  • Performs white and black lists via a Master Record Manager module, ensuring the right data is retained/de-duplicated.