3.1. Composing a causal model

In the opening paragraph of your theory section, present a causal model that represents how you think the phenomena that you are studying are related. You can think of a causal model as a simplification of the chain of events that leads to a certain outcome. The core of any causal model is a relationship between a cause X and its effect Y (see Figure 4).

Figure 4. Causal diagram example: a direct causal effect

Take the example of a campfire that results from striking a match in a small pyramid of dry wood. Striking the match (X) caused the wood to catch fire (Y). This example is very different from the typical chain of events that we are interested in as social scientists. We are often interested in differences in social behaviors between groups or individuals with specific characteristics that differ between groups. Typically, there is not such a strong relationship between the outcome and the preceding chain of events as in the case where striking a match leads to fire.

For the reader it is convenient to see the causal model before you explain the hypotheses you will be testing, because the model provides an overview of what follows. To stick with my own advice, consider the example in Figure 5, based on Bekkers & Schuyt (2008).

Figure 5. Causal model. Note: all relations are hypothesized to be positive.

In thinking about the model, it is best to work in the following order.

 

1. Select your dependent variable (Y).

The dependent variable is the phenomenon you are trying to understand or explain, by relating it to the conditions in which it occurs, the characteristics of people who are involved, how it changed over time, or how it varies between groups or nations. The name ‘dependent variable’ implies that it depends on something else, and that it can vary. A characteristic that does not vary – such as birth year – is not a variable, but a constant.

In the example in Figure 5 the dependent variable is volunteering, spending time working without pay for a nonprofit organization. I am intrigued by questions on prosocial behavior. Why do some people volunteer, while others do not? Volunteering depends on many factors, but in the model I have deliberately considered only a few. Models are simplifications of reality, and this one is no exception. We could draw a more explorative model including clusters of variables; this strategy is discussed below (see section 3.2).

 

2. Pick your main independent variable (X).

The independent variable is a cause of the phenomenon that you are trying to explain. In time, the independent variable precedes the dependent variable. The change or the event that you think is a cause must have happened before the outcome occurred.

In the example, the main independent variable is ‘Protestant’. We think that some people are more likely to volunteer than others because they are Protestant instead of not religious, Catholic, or belonging to another religion. It is a variable because it varies between people in the population, though it is not very variable over time within people: it does not change easily as a result of other variables. In this sense, it is an independent variable. Being Protestant is similar to ‘striking a match’ in the example of how striking a match causes fire.

 

3. Next, identify mediating variables (M), if any.

A mediating variable is a variable in between the cause and the effect, explaining why the cause has a certain effect. Mediating variables are also called ‘mediators’ or ‘intermediate variables’. A mediating variable depends on an independent variable, is a result of that variable, and explains the relationship of that independent variable with the dependent variable. You can think of a mediator as an event that changes as a result of a prior event, and leads to a future event. Without the mediating variable, the cause would not have the effect you observed.

In the causal diagram, the mediating variable is labeled ‘M’. The variable is positioned between the independent and the dependent variable. The relationship between X and Y is called the ‘direct effect’: there is nothing between X and Y, the effect is direct. The relationship between X and Y that flows through M is called an ‘indirect effect’. In the diagram, you can show that the relationship between X and Y is mediated by M by dotting the arrow between X and Y. You can display the initial relationship with a plus or minus sign, and add brackets to display your hypothesis that the relationship is mediated by other variables in the model. In the example above I have not done that, because I have assumed all relationships to be positive.

Figure 6. Causal diagram example: a mediating variable

In our example, the relationship between Protestant and volunteering – i.e. the higher level of volunteering among Protestants than among the non-religious and other religious groups – is mediated by two variables: being asked and altruistic values. We think that people are more likely to be asked to volunteer and have stronger altruistic values because they are Protestant, and that they are more likely to volunteer because they are asked. Also we think that we think that people have stronger altruistic values because they are Protestant, and that they are more likely to volunteer because they have stronger altruistic values. Being asked and altruistic values are similar to the friction that occurs as a result of the striking of a match. The striking of the match causes friction, and the friction causes the tip of the match to catch fire.

 

4. Identify moderating variables, if any.

A moderating variable is an additional independent variable (X2) that changes the influence of the main independent variable (X1). In the example of the match, you can think of oxygen as a moderating variable: when we strike a match, the presence of oxygen enables the tip of the match to catch fire. Striking a match in a room without oxygen would not result in fire. You can think of moderating variables in terms of necessary but not sufficient conditions. If there is oxygen, lighting the match sets it on fire. But the mere presence of oxygen does not cause the match to catch fire. In the social sciences, moderating variables are rarely necessary conditions, but mostly probabilistic.

Figure 7. Causal diagram example: a moderating variable

In our example of the relation between religious affiliation and volunteering there are no moderating variables. If we would assume that among Protestants the relationship between altruistic values and volunteering is stronger than among the non-religious, ‘Protestant’ would be a variable moderating the relationship between altruistic values and volunteering, and an additional arrow should be included to visualize this hypothesis. The model would then look like Figure 8 below. Arrow D represents the moderating influence of Protestant affiliation on the influence of altruistic values on volunteering. The influence of altruistic values on volunteering is positive, but may be stronger for certain groups, such as Protestants.

 

 

5. For each relationship in the model, identify the sign of the association.

You can identify the sign of the association – positive or negative – by using different colors, or adding signs. In the causal diagram examples, I’ve displayed positive relationships with black arrows, and negative relationships with red arrows. If some of your associations are stronger than others, you can display the strength of relationships using multiple signs, such as ++ or --.

 

6. Think about the relationships that are not in your model.

Is your model complete? Does it include all relevant influences? In most cases, you cannot include all the variables that influence your outcome. This is only a problem if the influence that you leave out will change the strength or the sign of an influence that is included in your model. We will discuss those below. Arrows that you could have drawn but did not include in the diagram are in fact hypotheses about null-effects: you expect these paths to be absent.

 

7. Think about variables that are not in your model.

In the social sciences there are many factors that we have not measured. These are omitted variables. In the example of the match, the moisture level of the wood is an important omitted variable. The more humid the wood is, that less likely that the match causes the wood to catch fire and keep burning. This omitted variable is a moderating variable: it reduces the effect of striking a match. The variable itself does not depend on anything else in the model, but will moderate the effect size you observe.

 

Figure 8. Causal model with letters

In our example of religious affiliation and volunteering, age is likely to be an omitted variable. Older people are more likely to be protestant and attend church more frequently than younger people. By omitting this variable, the relationship of religiosity with volunteering will be overestimated. The relationship of age with volunteering will be absorbed by the variables in the model.

 

Drawing the model

  1. The model works from left to right, not from top to bottom. The order of the variables is the order in which they are assumed to influence each other over time. Variables on the left are called independent variables because they do not depend on anything else; they are given, immutable data. ‘Protestant’ is an independent variable because one’s religious affiliation is determined at birth and a relatively stable (though not immutable) characteristic afterwards. Also the frequency of church attendance is a relatively stable phenomenon. How often one encounters requests to volunteer is a more variable characteristic. Also altruistic values are assumed to be open to change. Solicitation and altruistic values are both assumed to be the result of being Protestant and going to church, and not the other way around.
  2. In a causal model, each variable occurs only once. If you are unsure where to place a variable because there is no obvious chronological order, ask yourself two questions: (1) Does it depend on other characteristics that I already have in my model? (2) Can it have an influence on other variables in my model? In the example, you can figure out that ‘Protestant’ should be placed left of ‘Solicitation’ by asking yourself “How many people change their faith and become a Protestant because they have received solicitations for contributions to nonprofit organizations?” The obvious answer is: not many (if any).
  3. All relationships are assumed to be positive, and signs are omitted, except for the hypothesis about being Protestant moderating the relationship between altruistic values and volunteering. No expectations are expressed about the strength of relationships.
  4. In this model, the arrow from Protestant to volunteering is dotted. The higher proportion of volunteers among Protestants is expected to be the result of the higher likelihood of being asked to volunteer, and the higher level of altruistic values among Protestants.
  5. It takes some time and practice to order the variables in your model in a neat way. Start out with a sketch pad. Just start over when your model gets cluttered, when you discover that a variable is in the wrong place, or when you draw arrows that intersect each other.

 

You should display at least three variables in your model – the dependent, the independent and at least one mediating or moderating variable. As a rule, a causal model including multiple mediating variables is to be preferred above a model including only one. Because the model in Figure 8 includes two mediating variables it is better than a model with only one mediating variable. If we obtain support for mediation of the relationship between church attendance and volunteering by solicitation in an analysis ignoring altruistic values we cannot know if the relationship is retained when altruistic values are included.

Ideally, you construct the causal model based on your reading of the literature. Which mechanisms are implied or mentioned in theories about religion and civic engagement? And which mechanisms and conditions are examined in previous research?

 

When you are constructing your causal model, it can be difficult to determine theoretically whether a factor that you think is important is a mediating variable or a moderating variable. You can use the following rules:

  1. If the factor you are thinking about depends on other factors in your model, it is a mediating variable. It cannot be an independent variable. If you ask yourself the question “Does X influence Y?” and you find yourself answering “because…”, you are giving arguments about mediating variables.
  2. You can identify moderating variables by using the phrase ‘it depends’. If you ask yourself the question “Does X influence Y?” and you find yourself answering “It depends…”, the factors that you think the influence depends on are moderating variables.

 

Specifying a mediating variable is one way to explain the relationship between an independent and a dependent variable. Such a proximate explanation works forward in time: if it were not for the influence of X on the mediating variable, there would be no link between the independent variable and the outcome Y. For many relationships, you can think of multiple mediators. A good forward explanation of a relationship implies that you identify at least one mediator, and preferably distinguish several mediators. Discuss their interrelationship. In most cases, the various alternatives are not mutually exclusive: they can simultaneously be true.

Explanations can also go backwards in time, seeking an answer to the question: and where does X come from? In the example above one could argue that a Protestant religious affiliation is the result of parental religiosity and religious socialization. Thinking about the more distant conditions or events that cause X will enlarge your causal model, and turn your X into a mediating variable. Parental religiosity is related to giving and volunteering through children’s religiosity. The prediction from this model would be that children of Protestant parents who have left the church and are not affiliated anymore with Protestant church are not more likely to give and volunteer than children of parents who were not religious.