In the opening paragraph of your theory section, present a causal model that represents how you think the phenomena that you are studying are related. You can think of a causal model as a simplification of the chain of events that leads to a certain outcome. The core of any causal model is a relationship between a cause X and its effect Y (see Figure 4).
Take the example of a campfire that results from striking a match in a small pyramid of dry wood. Striking the match (X) caused the wood to catch fire (Y). This example is very different from the typical chain of events that we are interested in as social scientists. We are often interested in differences in social behaviors between groups or individuals with specific characteristics that differ between groups. Typically, there is not such a strong relationship between the outcome and the preceding chain of events as in the case where striking a match leads to fire.
For the reader it is convenient to see the causal model before you explain the hypotheses you will be testing, because the model provides an overview of what follows. To stick with my own advice, consider the example in Figure 5, based on Bekkers & Schuyt (2008).
In thinking about the model, it is best to work in the following order.
1. Select your dependent variable (Y).
The dependent variable is the phenomenon you are trying to understand or explain, by relating it to the conditions in which it occurs, the characteristics of people who are involved, how it changed over time, or how it varies between groups or nations. The name ‘dependent variable’ implies that it depends on something else, and that it can vary. A characteristic that does not vary – such as birth year – is not a variable, but a constant.
In the example in Figure 5 the dependent variable is volunteering, spending time working without pay for a nonprofit organization. I am intrigued by questions on prosocial behavior. Why do some people volunteer, while others do not? Volunteering depends on many factors, but in the model I have deliberately considered only a few. Models are simplifications of reality, and this one is no exception. We could draw a more explorative model including clusters of variables; this strategy is discussed below (see section 3.2).
2. Pick your main independent variable (X).
The independent variable is a cause of the phenomenon that you are trying to explain. In time, the independent variable precedes the dependent variable. The change or the event that you think is a cause must have happened before the outcome occurred.
In the example, the main independent variable is ‘Protestant’. We think that some people are more likely to volunteer than others because they are Protestant instead of not religious, Catholic, or belonging to another religion. It is a variable because it varies between people in the population, though it is not very variable over time within people: it does not change easily as a result of other variables. In this sense, it is an independent variable. Being Protestant is similar to ‘striking a match’ in the example of how striking a match causes fire.
3. Next, identify mediating variables (M), if any.
A mediating variable is a variable in between the cause and the effect, explaining why the cause has a certain effect. Mediating variables are also called ‘mediators’ or ‘intermediate variables’. A mediating variable depends on an independent variable, is a result of that variable, and explains the relationship of that independent variable with the dependent variable. You can think of a mediator as an event that changes as a result of a prior event, and leads to a future event. Without the mediating variable, the cause would not have the effect you observed.
In the causal diagram, the mediating variable is labeled ‘M’. The variable is positioned between the independent and the dependent variable. The relationship between X and Y is called the ‘direct effect’: there is nothing between X and Y, the effect is direct. The relationship between X and Y that flows through M is called an ‘indirect effect’. In the diagram, you can show that the relationship between X and Y is mediated by M by dotting the arrow between X and Y. You can display the initial relationship with a plus or minus sign, and add brackets to display your hypothesis that the relationship is mediated by other variables in the model. In the example above I have not done that, because I have assumed all relationships to be positive.
In our example, the relationship between Protestant and volunteering – i.e. the higher level of volunteering among Protestants than among the non-religious and other religious groups – is mediated by two variables: being asked and altruistic values. We think that people are more likely to be asked to volunteer and have stronger altruistic values because they are Protestant, and that they are more likely to volunteer because they are asked. Also we think that we think that people have stronger altruistic values because they are Protestant, and that they are more likely to volunteer because they have stronger altruistic values. Being asked and altruistic values are similar to the friction that occurs as a result of the striking of a match. The striking of the match causes friction, and the friction causes the tip of the match to catch fire.
4. Identify moderating variables, if any.
A moderating variable is an additional independent variable (X2) that changes the influence of the main independent variable (X1). In the example of the match, you can think of oxygen as a moderating variable: when we strike a match, the presence of oxygen enables the tip of the match to catch fire. Striking a match in a room without oxygen would not result in fire. You can think of moderating variables in terms of necessary but not sufficient conditions. If there is oxygen, lighting the match sets it on fire. But the mere presence of oxygen does not cause the match to catch fire. In the social sciences, moderating variables are rarely necessary conditions, but mostly probabilistic.
In our example of the relation between religious affiliation and volunteering there are no moderating variables. If we would assume that among Protestants the relationship between altruistic values and volunteering is stronger than among the non-religious, ‘Protestant’ would be a variable moderating the relationship between altruistic values and volunteering, and an additional arrow should be included to visualize this hypothesis. The model would then look like Figure 8 below. Arrow D represents the moderating influence of Protestant affiliation on the influence of altruistic values on volunteering. The influence of altruistic values on volunteering is positive, but may be stronger for certain groups, such as Protestants.
5. For each relationship in the model, identify the sign of the association.
You can identify the sign of the association – positive or negative – by using different colors, or adding signs. In the causal diagram examples, I’ve displayed positive relationships with black arrows, and negative relationships with red arrows. If some of your associations are stronger than others, you can display the strength of relationships using multiple signs, such as ++ or --.
6. Think about the relationships that are not in your model.
Is your model complete? Does it include all relevant influences? In most cases, you cannot include all the variables that influence your outcome. This is only a problem if the influence that you leave out will change the strength or the sign of an influence that is included in your model. We will discuss those below. Arrows that you could have drawn but did not include in the diagram are in fact hypotheses about null-effects: you expect these paths to be absent.
7. Think about variables that are not in your model.
In the social sciences there are many factors that we have not measured. These are omitted variables. In the example of the match, the moisture level of the wood is an important omitted variable. The more humid the wood is, that less likely that the match causes the wood to catch fire and keep burning. This omitted variable is a moderating variable: it reduces the effect of striking a match. The variable itself does not depend on anything else in the model, but will moderate the effect size you observe.
In our example of religious affiliation and volunteering, age is likely to be an omitted variable. Older people are more likely to be protestant and attend church more frequently than younger people. By omitting this variable, the relationship of religiosity with volunteering will be overestimated. The relationship of age with volunteering will be absorbed by the variables in the model.
Drawing the model
You should display at least three variables in your model – the dependent, the independent and at least one mediating or moderating variable. As a rule, a causal model including multiple mediating variables is to be preferred above a model including only one. Because the model in Figure 8 includes two mediating variables it is better than a model with only one mediating variable. If we obtain support for mediation of the relationship between church attendance and volunteering by solicitation in an analysis ignoring altruistic values we cannot know if the relationship is retained when altruistic values are included.
Ideally, you construct the causal model based on your reading of the literature. Which mechanisms are implied or mentioned in theories about religion and civic engagement? And which mechanisms and conditions are examined in previous research?
When you are constructing your causal model, it can be difficult to determine theoretically whether a factor that you think is important is a mediating variable or a moderating variable. You can use the following rules:
Specifying a mediating variable is one way to explain the relationship between an independent and a dependent variable. Such a proximate explanation works forward in time: if it were not for the influence of X on the mediating variable, there would be no link between the independent variable and the outcome Y. For many relationships, you can think of multiple mediators. A good forward explanation of a relationship implies that you identify at least one mediator, and preferably distinguish several mediators. Discuss their interrelationship. In most cases, the various alternatives are not mutually exclusive: they can simultaneously be true.
Explanations can also go backwards in time, seeking an answer to the question: and where does X come from? In the example above one could argue that a Protestant religious affiliation is the result of parental religiosity and religious socialization. Thinking about the more distant conditions or events that cause X will enlarge your causal model, and turn your X into a mediating variable. Parental religiosity is related to giving and volunteering through children’s religiosity. The prediction from this model would be that children of Protestant parents who have left the church and are not affiliated anymore with Protestant church are not more likely to give and volunteer than children of parents who were not religious.