The guidelines for the organization of your data and code are based on two general principles: simplicity and explanation. Make verification of your results as simple as possible, and provide clear documentation so that people who are not familiar with your data or research can execute the analyses and understand the results.
Simplify the file structure. Organize the files you provide in the simplest possible structure. Ideally, a single file of code produces all results you report. Conduct all analyses in the same software package when possible. Sometimes, however, you may need different programs and multiple files to obtain the results. If you need multiple files, provide a readme.txt file that lists which files provide which results.
Deidentify the data. In the data file, eliminate the variables that contain information that can identify specific individuals if these persons have not given explicit consent to be identified. Do not post data files that include IP addresses of participants, names, email addresses, residential home addresses, zip codes, telephone numbers, social security numbers, or any other information that may identify specific persons. Do not deidentify the data manually, but create a code file for all preprocessing of data to make them reproducible.
Organize the code. In the code, create at least three sections:
1. Preliminaries. The first section includes commands that install packages that are required for the analyses but do not come with the software. Include a line that users can adapt, identifying the path where data and results are stored. Use the same path for data and code. For example:
use cd "C:\Users\rbs530\surfdrive\Shared\VolHealthMega"
The first section also includes commands that specify the exact names of the data files required for the analysis. For example:
use "Data\Pooled\VolHealthMega.dta", clear
2. Data preparation. The second section includes commands that create variables and recode them. Also this section assigns labels to variables and their values, so their meaning is clear. For example:
label variable llosthlt "Lost health from t-2 to t-1"
3. Results. The third section includes the commands that produce the results reported in the paper. Add comments to identify which commands produce which results, e.g.
*This produces Table 1:
summ *
4. Results. An optional fourth section contains the commands that produce the results reported in the Appendices.
*Appendix Table S12a:
xtreg phealth Dvolkeep Dvoljoin Dvolquit year l.phealth l2.phealth l3.phealth l4.phealth, fe
Explain ad hoc decisions. Document and explain your decisions. Throughout the code, add comments that explain the reasoning behind the choices you make that you have not pre-registered. E.g. "collapsing across conditions 1 and 2 because they are quantitatively similar and not significantly different".
Double check before submission. When you are done, ask your supervisor to execute the code. Does the code produce the results reported in the paper? Can your supervisor understand your decisions? If so, you are ready.
Locate your materials. Identify the URL that contains the data and code that produce the results you report. If you write an empirical journal article, add the URL to the abstract as well as in the data section. Identify the software package and version that you used to produce the results.
Set up a repository. Create a repository, preferably on the Open Science Framework, https://osf.io/ where you post all materials reviewers and readers need to verify and replicate your paper: the deidentified data file, the code, stimulus materials, and online appendix tables and figures. Here is a template you can use for this purpose: https://osf.io/3g7e5/. Help the reader navigate through all the materials by including a brief description of each part.