When you design a set of tests of your hypotheses you can often choose between many different types of tests that vary from ‘loose’ to ‘strict’. For instance, when your hypothesis is that volunteering contributes to health, the finding from a cross-sectional design that volunteers in interviews report feeling healthier than non-volunteers is a loose test (Bekkers & Verkaik, 2015). This is because your finding can also be explained from the reverse causal direction, such that health contributes to volunteering. The finding from your loose test is much less convincing than the finding from a longitudinal survey that people who volunteer live report fewer chronic illnesses and ultimately live longer than non-volunteers. The best evidence, however, would come from a field experiment with repeated measures showing that giving people in a treatment group additional opportunities to volunteer keeps them healthy and alive for a longer period than people in a control group who were not given additional opportunities.
As a rule, the more stringent test is more informative. You may not be surprised that tests of the more stringent kind in the example above are much less often positive than loose tests. Generalizing this finding, I have introduced Bekkers’ Stringent Test Administration Rule (BSTAR): the more stringent the test, the better. If the result survives, it is more robust, and less likely to be a chance finding. A more stringent test provides you with a more informative result. The rule is similar to the Stainless Steel of Evaluation (Rossi, 1987): “the better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero”. To give just one other example: a massive study on the effects of Facebook ads (Gordon, Zettelmeyer, Bhargava & Chapsky, 2017) showed that observational methods yielded threefold overestimations in half of the campaigns analyzed.
On the positive side: if your hypothesis has withstood a more stringent test, the result is less likely to be a false positive. Compare testing theories with a high jump competition in athletics. When the bar is set too low, we do not learn very much about the maximum height a jumper can clear. Only when the bar is set relatively high we learn which jumper is the best one in the competition.
When you have more observations at your disposal, and measures with higher reliability and validity, it is more likely that a null finding (no relationship) or a negative result (contrary to your hypothesis) is actually true.
Usually, there are multiple pathways from an independent to a dependent variable (i.e., from an antecedent to a behavior). It is better to evaluate multiple pathways simultaneously rather than only one. See sections 2.4.5. and 3.1 on testing multiple mediating mechanisms.
The final check on the design of your own research is to go through the bullshit bingo card (Figure 18), displaying common problems in research design and theory construction (Occamsbeard, 2014).