Data anonymization solution

Anonymization solutions for free form data

A reporting agency for national hospital safety had a primary goal to provide anonymous and accurate incident reports on accidents. The agency exists to improve patient safety by enabling hospitals nationwide to learn from patient safety incidents. It identifies and reduces risks to patients receiving care and leads onto national initiatives to improve patient safety.

Through its national reporting system, the agency collects confidential reports on patient safety incidents from healthcare staff across the country. Clinicians and safety experts help analyze these reports to identify common risks and opportunities to improve patient safety. Feedback and guidance are provided to healthcare organizations to improve patient safety. These include alerts to address specific safety risks, tools to build a strong safety culture and national initiatives in specific areas such as hand hygiene, design, nutrition and cleaning.



The format of the reporting data is in unstructured free form text. This makes it particularly difficult to extract identifying attributes so that the information can be made truly anonymous – a pre-requisite for open reporting. For example, to make an incident report anonymous, the statistician has to trawl through all the text (often several or many paragraphs) to find any occurrences of the name of the patient, staff member, ward and even in some cases the detail of the injury that may need to be made anonymous.

The agency was using a system which was manual, labor intensive and costly to run. If the agency needed to ‘white list’ or add a new word to be made anonymous and applied to the data, they had to buy professional services from the existing data quality vendor to add this into the code. This was very costly and to get the best value for money, any changes in white listing were stored up and then implemented in bulk to lower the cost of the professional services required.

This meant that the agency had a list that they were maintaining separately before implementing the bulk list through the vendor into the code (around every 6-9 months). They had to look for these separate list items specifically. For example they would know that they wouldn’t have ‘Martin Ward’ included in the code yet, so they would have to look for it manually in every case received.
The agency was looking for a solution that would reduce the costs and labor involved in the process of creating truly anonymous reporting.


Datactics data quality software reviews free text unstructured data, identifying words that need to be anonymous and building a master reference library of these terms. Once set up, the reference file is modifiable by the statistician at any point. New white listed terms can be added to the project without any laborious updating of code. The agency can handle unstructured free form data, using less staff, costing less money and with the ability to update their own anonymous terms reference library.
The agency can now manage and maintain the entire process itself, thus cutting out all professional services costs. The changes that are made to the reference library lists can be reviewed and tracked so it is fully controlled with no knock on effects or time delays.


Additions to the reference list
A patients name is anonymous information. But if that patient has an ambiguous word as a first or last name then the rules applied might allow it rather than blot it out. This is a risk that must be removed as completely and efficiently as possible.

For example, you may have a patient that is called Mr Leg, who fell and hit his shoulder on Ward 3 at Mary Rose Hospital on 12th December 2009. The term ‘leg’ in this sentence is not, of course, referring to the man’s leg – it is referring to his name and will need to be added to the ‘white list’ as needing to be made anonymous. The Datactics solution will, for example, identify words starting with capital letters to facilitate this process.
Similarly a man might be called Martin Ward. It might appear that the name of the hospital ward is Martin Ward, when in fact it is the patient’s name. This could happen with other ambiguous terms such as ‘head’ and ‘hall’.
Other types of terms that may be added to the reference list are abbreviations for physicians or doctors, or names of medical instruments that may be used to treat a patient during an accident. These instruments or treatments are often called after the person who developed them, such as the Heimlich manoeuvre or ‘Owen Mumford’ which is a screening device.
Once all the business rules had been put in place and the white listed reference files produced, the agency was able to manage and update internally. This improves the service, reduces the time taken and increases efficiency.


  • New terms are added to the reference files easily and are managed using Excel rather than coding; this means that the data owner can manage the process without the involvement of their own IT department or an external vendor.
  • All changes can be tracked and reviewed with tiered approval reducing the risk of adverse consequences.
  • With Datactics software, increasing the volume of hospitals to be reviewed and reported on will not add cost to the process of anonymous incident reporting.
  • Datactics provides a complete product based solution which provides lower cost of ownership, higher speed of implementation, greater responsiveness and higher accuracy levels for the user.