Data categorisations

 

Raw data

Raw data is the original, unprocessed data collected at the start of your research. It has not been cleaned, analysed, or reorganized. Depending on your project, raw data can take many forms.

Examples include:

What counts as raw data depends on your research context. If you are reusing a dataset that was cleaned by another research group, that version may be considered your raw data. Likewise, if a laboratory provides you with processed outputs, those files are your raw data for the purpose of your project.

 

Processed data

Once you have collected the raw data, it will then be used to create your processed data. This is raw data that has been altered in some way, often with the intent to make it suitable for analysis. This includes processes such as cleaning the data, pseudonymizing the data, and performing statistical analysis. Within the research lifecycle, we can consider this data in a preparation stage. Processing raw data produces a new, structured dataset that is ready for analysis and can be used to answer your research question.

 

Analysed data

Once your data has been cleaned, formatted, and organised, the next step is analysis. Analysed data refers to the output of this stage - the results generated from applying statistical, computational, or qualitative methods to your processed dataset.

This is typically the version of the data you use to support your findings and present in publications. It may include:

 

 

Other data

Finally, you have the remainder of your data assets. This category can also be considered your metadata and supporting documentation. Throughout your project, you have amassed documentation that is valuable to the context of your other data assets. The inclusion of these data assets should ensure others (and yourself) can understand and interpret your data package at a later stage. Examples of documentation would include ReadME files, Codebooks, Software packages, interview guides, metadata files, and any other files used to document your process.

 

 

Below are two examples on how to structure your data assets.

The first example consists of interview data which gets transcribed and analysed by use of a self-written R code.

 

 

The example below consists of experimental physical data.