Datactics DQE (Data Quality Extensions)
Datactics DQE is a data cleansing tool for MySQL users. The following is a brief overview of the Datactics’ Clean, Extract and Validate engines which have been embedded within DB2 running on Windows and Linux. Z-O/S is due shortly.
These Data Quality Extensions offer the following benefits:
• Profile whole data sets to reveal all invalid records
• Parse data to identify attributes and move to specific field
• Adjustable fuzzy matching more effective than SQL
• Standardise units of measure, abbreviations, colours, descriptions etc
• Embedded in the database, no integration issues
Clean
The Clean function enables dynamic cleansing and standardization of the data being returned by a SQL statement, function or stored procedure. Using reference tables, it allows the user to apply changes to the data held in a specified column. For example to remove punctuation, standardize abbreviations or correct misspellings.
Extract
The Extract function uses reference tables containing keywords, patterns and regular expressions to extract data elements from within a specified column. For example to parse a name or address into appropriate fields, or to isolate individual attributes from a product description.
Validate
The Validate function can be used to profile data in one or more columns and verify the content for completeness, accuracy and validity. Again reference tables are used to specify what values, data types and patterns are allowed. The data can then be flagged or scored appropriately, which can then be used to produce exception reports of invalid data, or to ascertain which information is preferred.
Fuzzy matching
Datactics’ fuzzy matching algorithms can be applied to the reference data for each of the above functions to allow a pre-defined number of errors.
Example of fuzzy matching applied to Clean function: If we want to standardize “Apartment” to “Apt.”, allowing 1 error will enable us to compensate for possible spelling mistakes such as “Appartment” and “Aprtment” and therefore standardize these also.
Value Proposition
The advantage of using these pre-built functions is that they achieve much more than can be done using SQL queries alone yet they run in the same way and can be built into a SQL query by anyone who is familiar with SQL.
The profiling capability alone brings big advantages over the use of small samples and when combined with the adjustable fuzzy matching will give unparalleled accuracy.
This means that wherever there is a need for cleansing data such as during system migrations or dashboarding of data quality for governance and compliance better results are available at a lower cost and shorter time than trying to handcraft SQL queries.




