When collecting data from individuals, it is important to minimise the amount of personal information gathered. Wherever possible, avoid collecting directly identifiable data, such as names, addresses, or contact details, unless it is essential to address your research question. This approach helps protect participants' privacy and reduces the risk of re-identification.
If personal data must be collected, it should be de-identified to the greatest extent possible. Two commonly used terms related to de-identification are anonymous and pseudonymous, but they refer to different concepts.
Anonymous data means a participant can never be (re-)identified from the data contained in a dataset, even if the data is merged with another dataset. Full anonymity is very difficult and takes time and consideration to achieve; always consider the likelihood of re-identification when evaluating if the data is indeed anonymous. If you are unsure, reach out for support.
Examples of anonymous data:
Pseudonymous data means a participant can still be re-identified from the dataset. However, they cannot be identified without some additional information.
Techniques for pseudonymisation:
Below you will find a table which portrays the differences between terms.
