Why did you become a scientist? I bet it had more to do with curiosity or a passion for discovery than for a love of self-promotion, writing grants, or serving on committees. Scientists tend to be curious, and we tend to agree with the ideals of transparent research: sharing data, publishing results regardless of results, and evidence-based reasoning.
However, we exist in a culture that rewards questionable practices, such as promoting the most polished, unexpected, and exciting results and ignoring the uncertainty, contradictions, and negative results. Moreover, the incredible workload of the typical practicing academic makes the idea of adding more work to the process seem laughable.
Despite these challenges, we must confront the practices that may cause damage to the body of evidence in the published literature and find ways to improve our research process so that knowledge building is more efficient and effective.
Improving research workflow
Preregistration of analysis plans involves moving key analytical decisions to much earlier in the research lifecycle, before observing the data. Though well established in clinical trials, preregistration is a new concept to most of the basic sciences. I’ll get to the reasons why preregistration is good for science in a bit, but what I want to convey first is the way in which it can improve the research workflow.
Preregistration specifies all of the data collection and analysis procedures in an uneditable, time stamped plan. It pairs the study design and the analysis decisions to the same point in the research process.
By bringing your attention to analysis questions while you are designing the study, you can more easily identify and fix problems with your design before it is too late.
This synchronization allows for better planning and the ability to more easily address potential problems. By bringing your attention to analysis questions while you are designing the study, you can more easily identify and fix problems with your design before it is too late. Preregistration reduces the likelihood of the all-too-familiar wasted data collection efforts or attempts to repair an overlooked flaw after the fact.
Not only does it improve overall efficiency, but preregistration also makes more clear distinct steps in the scientific workflow. Using data to generate potential discoveries and using data to subject those studies to tests are distinct processes.
This distinction is known as exploratory (or hypothesis-generating) research and confirmatory (or hypothesis-testing) research. In the daily practice of doing research, it is easy to confuse which one is being done.
But there is a way – preregistration. Preregistration defines how a hypothesis or research question will be tested – the methodology and analysis plan. It is written down in advance of looking at the data, and it maximizes the diagnosticity of the statistical inferences used to test the hypothesis.
Exploratory or confirmatory research
After the confirmatory test, the data can then be subjected to any exploratory analyses to identify new hypotheses that can be the focus of a new study. In this way, preregistration provides an unambiguous distinction between exploratory and confirmatory research.
Preregistration provides an unambiguous distinction between exploratory and confirmatory research.
To illustrate how confirmatory and exploratory approaches can be easily confused, picture a path through a garden, forking at regular intervals, as it spreads out into a wide tree. Each split in this garden of forking paths is a decision that can be made when analyzing a data set.
Do you exclude these samples because they are too extreme? Do you control for income/age/height/wealth? Do you use the mean or median of the measurements?
Each decision can be perfectly justifiable and seem insignificant in the moment. After a few of these decisions there exists a surprisingly large number of reasonable analyses. One quickly reaches the point where there are so many of these reasonable analyses, that the traditional threshold of statistical significance, p < .05, or 1 in 20, can be obtained by chance alone.
If we don’t have strong reasons to make these decisions ahead of time, we are simply exploring the dataset for the path that tells the most interesting story. Once we find that interesting story, bolstered by the weight of statistical significance, every decision on that path becomes even more justified, and all of the reasonable, alternative paths are forgotten.
Without us realizing what we have done, the diagnosticity of our statistical inferences is gone. We have no idea if our significant result is a product of accumulated luck with random error in the data, or if it is revealing a truly unusual result worthy of interpretation.
This is why we must hold ourselves accountable to decisions made before seeing the data. Without putting those reasons into a time-stamped, uneditable plan, it becomes nearly impossible to avoid making decisions that lead to the most interesting story.
This is what preregistration does. Without preregistration, we effectively change our hypothesis as we make those decisions along the forking path. The work that we thought was confirmatory becomes exploratory without us even realizing it.
As rigorous and free from bias as possible
When it comes time to put our new explanations to the test, we will make progress more efficiently and effectively by being as rigorous and as free from bias as possible.
I am advocating for a way to make sure the data we use to create our explanations is separated from the data that we use to test those explanations. Preregistration does not put science in chains.
Scientists should be free to explore the garden and to advance knowledge. Novelty, happenstance, and unexpected findings are core elements of discovery. However, when it comes time to put our new explanations to the test, we will make progress more efficiently and effectively by being as rigorous and as free from bias as possible.
Preregistration is effective. After the United States required that all clinical trials of new treatments on human subjects be preregistered, the rate of finding a significant effect on the primary outcome variable fell from 57% to just 8% within a group of 55 cardiovascular studies. This suggests that flexibility in analytical decisions had an enormous effect on the analysis and publication of these large studies.
Preregistration is supported by journals and research funders. Taking this step will show that you are taking every reasonable precaution to reach the most robust conclusions possible, and will improve the weight of your assertions.
Most scientists, when testing a hypothesis, do not specify key analytical decisions prior to looking through a data set. It’s not what we’re trained to do. We at the Center for Open Science want to change that. We will be giving 1,000 researchers $1,000 prizes for publishing the results of preregistered work. You can be one of them. Begin your preregistration by going to https://cos.io/prereg.