5.3. How to discuss your data and methods

In your data and methods section, start with a table with descriptive statistics (minimum, maximum values, means, standard deviations) for all of your variables. In the example in Table 4, the variables are grouped: the dependent variables are presented first, then the independent variables. Include not only the original variables, but also the variables you derived or constructed by taking logs, censoring certain values or imputing missing data, and taking means or factor scores for scales.

Discuss the values in the table of descriptive statistics. Compare the means and standard deviations with known population values. Demonstrate that your research sample has sufficient variance on the dependent and independent variables to derive valid conclusions.

Creating a table of descriptive statistics is a good practice that helps you identify errors in the construction and handling of variables. If a variable has a very high maximum (or very low minimum) value, you may need to think about the way you treat missing values and outliers. If you forgot to recode missing values such ‘99’, ‘99999’ or ‘-1’ or made a mistake, the table will probably reveal it if you pay close attention. You should also check whether you have observations for all units in the dataset.

In the example of Table 4, most of the variables look fine: about a quarter of your participants donated, and they gave on average a little less than half of their endowment. The factor score for empathic concern has an average of 0 and a standard deviation of 1, and the empathy induction manipulation was randomly administered to half of your participants.

However, you also see that the level of education is right-skewed: the average is much higher than the midpoint of the scale. If you get such a result, check the data and your coding. The skewness may be real, and if so, describe it. The amount donated as a dependent variable was constrained in this hypothetical experiment to a maximum of 10 – the complete endowment that the participants received. Outside the experiment, variables such as the amount donated in the course of a calendar year often contain outliers: very high values that rarely occur. In this particular dataset, one participant reported having donated €1,250 in the past year. This is a very high value relative to the average. Moreover, this particular amount by itself increases the average and the standard deviation.

Table 4. Descriptive Statistics Table (hypothetical example)

Leaving this observation in the dataset – assuming it is accurate – may strongly affect the results you obtain in a comparison of means and even a regression analysis. It is good practice to design a strategy to handle potential outliers before you collect your data and certainly before you start analyzing the data. A solution that reduces the influence of outliers but keeps them in the dataset is to winsorize them. This technique, named after the engineer and biostatistician Charles P. Winsor, reduces the original value to a prespecified value, for instance the value of the 99th percentile. The advantage of winsorizing over the elimination of outliers, also known as trimming, is that you do not lose observations from the dataset.

In addition to creating a table of descriptive statistics, another tool to detect errors and other peculiarities in the data is the graphic display of the distribution of variables. Create histograms, not only to see the skewness of variables, but also to get a sense for the range of values and to see outliers. Display your correlations in a scatterplot. You will immediately detect deviations from non-normality, find outliers and influential observations. Follow the rules you have pre-registered for handling these cases.