On the unbearable lightness of mandatory data sharing

blog by Tommi Nyman (Department
of Biology, University of Eastern Finland), Winner of the Open Data Award
at the BioMed Central 5th
Annual Research Awards

One of the
most pleasant surprises of this spring was that yours truly with coauthors Veli
Vikberg, David R. Smith, and Jean-Luc Boevé received BioMed
Central’s Open Data Award
for our article How common is ecological speciation in plant-feeding
insects? A ‘Higher’ Nematinae perspective’
. (The other highlight was
naturally Finland’s phenomenal victory in the Ice Hockey World Championship Final
last Sunday).

We were  very happy to receive the prize, as we don’t
get awards as frequently as we’d like to! At the same time, we fully realized
that a large portion of the credit in our case must go to a persistent, anonymous
referee of our paper, who demanded—twice—that we also publish the background data
used in our phylogeny-based ecological analyses, not just the sequence data that
we used to reconstruct the phylogenetic trees. So, since the reviewer didn’t
give up (and the editor sided with the referee), we sat down and did what we
should have done voluntarily in the first place: we gathered all relevant ecological
information (host-plant associations and species numbers of various sawfly
taxa) into a (hopefully) coherent table, and included it as an additional online
in the end of the article.

archiving of original data has been standard practice for a long time in research
on phylogenetics and population genetics, and scientific journals typically will
release articles only after all DNA sequences used in the analyses have been
submitted to a public database such as GenBank. Now this mode of data sharing
is making its way to more ecologically-oriented journals as well; for example, Evolution
introduced a mandatory data archiving policy in
2011, and now requires that raw data should be presented in a way that makes it
possible to repeat all statistical analyses used in an accepted article. The ongoing
rise of open-access online journals will make data sharing easier than ever,
since page space is not a limiting factor anymore.

To the
individual researcher, preparing and referencing background data for archiving may
feel like an unwelcome addition to the already-tedious publication process but,
at least in our case, we knew at heart that the reviewer was very correct in their
demands. In general, mandatory archiving is a way to bring rigour to data
collection and management and will, at the same time, improve transparency of
research and publishing.

There are
also other benefits to science that will become evident only in the long run. In
my field of ecological phylogenetics, statistical methods are improving with an
astonishing speed, meaning that re-analyses of previously-published datasets
undoubtedly will be common in the future, and may lead to reassessments of old results.
Meta-analyses combining results of multiple studies also will benefit greatly from
access to raw original data rather than to a few statistical indices extracted
from previously-made analyses. In particular, I expect that archived datasets will
be useful in higher-level education, as repeating the statistical analyses used
in a published article is an efficient way of learning how science is done in
practice. Naturally, the end goal of such exercises must be that students come up
with solutions that are better than the ones used by the original researchers.

View the latest posts on the Research in progress blog homepage