Anonymous and pseudonymous data

When collecting data from individuals, it is important to minimise the amount of personal information gathered. Wherever possible, avoid collecting directly identifiable data, such as names, addresses, or contact details, unless it is essential to address your research question. This approach helps protect participants' privacy and reduces the risk of re-identification.

 

If personal data must be collected, it should be de-identified to the greatest extent possible. Two commonly used terms related to de-identification are anonymous and pseudonymous, but they refer to different concepts.

Anonymisation does not equal pseudonymisation

 

Anonymous data means a participant can never be (re-)identified from the data contained in a dataset, even if the data is merged with another dataset. Full anonymity is very difficult and takes time and consideration to achieve; always consider the likelihood of re-identification when evaluating if the data is indeed anonymous. If you are unsure, reach out for support.

Examples of anonymous data:

Be aware: small sample sizes and unanimous answers can impact the anonymity of a dataset. This should be considered when evaluating whether you can classify your data as truly anonymous.

 

Pseudonymous data means a participant can still be re-identified from the dataset. However, they cannot be identified without some additional information.

Techniques for pseudonymisation:

If you use a key file to store the link between identifiable information and the associated random IDs, this should be stored in a separate location from the remainder of your research data. This increases the security of your data and reduces the risk of unauthorised re-identification of participants if your research data is compromised.

 

 

Below you will find a table which portrays the differences between terms.

 

 

De-identification table