Anonymous and pseudonymous data

When collecting data from people, it is always best to minimise the amount of personal data you collect from your participants. If possible, you should not collect directly identifiable information unless it is required to answer your research question. This helps protect the privacy of your participants and decreased the likelihood of re-identification.

 

If personal data must be collected, it should be de-identified as much as possible. Two terms which are used to describe de-identification are anonymous and pseudonymous. They do not mean the same thing.

Anonymisation does not equal pseudonymisation

 

Anonymous data means a participant can never be re-identified from the data contained in a dataset, even if the data is merged with another dataset. Full anonymity is very difficult and usually not achievable without removing all the 'useful' information that is good for research.

Examples of anonymous data:

Be aware: small sample sizes and unanimous answers can impact the anonymity of a dataset, this should be considered when evaluating whether you can classify your data as truly anonymous.

 

Pseudonymous data means a participant can still be re-identified from the dataset. However, they cannot be identified without some additional information.

Techniques for pseudonymisation:

If you use a key file to store the link between identifiable information and the associated key file, this should be stored in a separate location to the remainder of your research data. This increases the security of your data and reduces the risk of unauthorised re-identification of participants if your research data is compromised.

 

 

Below you will find a table which portrays the differences between terms.

 

 

De-identification table