In defence of supplemental data files: don’t throw the baby out with the bathwater

Since the Journal of Neuroscience, published by the Society for Neuroscience, decided to no longer accept supplementary files with manuscripts, publishing online supplementary material has been the subject of continuing debate in scholarly circles.

BioMed Central takes the opposite stance to the Journal of Neuroscience. Not only do we allow authors to submit supplementary files –  which we at BioMed Central refer to as “additional files”– we positively encourage it, and we continue to look for ways to make the additional data supplied by authors as part of their articles more useful and accessible. Through our Open Data Award, introduced in 2010, we also seek to recognise authors who have shown leadership in sharing the underlying data associated with their research publication.

In justifying its decision to disallow supplementary material, the Journal of Neuroscience indicated that it felt such files placed an unnecessary burden on peer reviewers. We understand this concern – peer reviewers are vital but their numbers are limited. Although the expectation that every reviewer should re-analyze all data sets and supporting information is unrealistic, when they are available as additional files they are at least available if the reviewer wishes to dig deeper.  Most importantly, by sharing the underlying data, authors are increasing transparency and promoting reproducibility – one of the foundations of science.

Lab Times, which surprisingly named Journal of Neuroscience as its ‘journal of the year’ for its decision, suggested that supplementary material was being  used as a space-saving device, and often includes information that would be better presented as part of the article itself. This is certainly a valid criticism of many print journals, which do indeed severely limit the space authors can use to present data and to describe methods. However, because BioMed Central journals are entirely online, they have no such space constraints. If material is best presented in the body of an article, authors can do this. Additional files are intended for use in sharing underlying datasets, movies, 3D-visualizations and other such material that is not easy to present within the article.

While some research domains including astronomy, genomics and economics have established cultural norms relating to data sharing, other fields have traditionally taken a much more proprietary approach to data. The challenge, in encouraging wider data sharing, is to demonstrate how sharing data can benefit not only the research community as a whole, but also can increase the visibility, impact and citation potential of scientists’ work.

Improved tools specifically designed to track data file usage and citation are an important aspect of demonstrating the value of data sharing, and BioMed Central supports projects such as DataCite to better enable this. Meanwhile, we continue to look for further ways to encourage authors to publish and share data sets. The Open Data Award is one example, and the judging panel will be looking at the open data sets shared in BioMed Central journals during 2010 as it commences the process of identifying the winner of the next award.

In September, Charles Perou and colleagues published an article in Breast Cancer Research identifying an important new group of breast cancers, which included additional genetic and clinical data as an Excel file.

Editor-in-Chief Prof Lewis Chodosh, University of Pennsylvania, said: “The supplementary data are most definitely harvestable by other scientists and are highly useful.  I would also say that the amount and quality of supplementary data go beyond what most authors typically provide and, in that regard, leads by example. Dr. Perou has led many collaborative efforts in this area and I believe that open access to his group’s data is a reflection of this collaborative approach.”

And of an article published around the same time in BMC Evolutionary Biology, BMC Series Biology Editor Dr Elizabeth Moylan said: “Jean-Luc Boevé and colleagues show great transparency in how they sampled their insects rather than a more common ‘trust me we did it right’ approach.

“As a result we have collection data for exemplar specimens, and taxonomic and ecological background information in Additional file 1, which is extremely informative and  thorough. I think they went beyond what is normally seen as readers will not have to go through the references and work out the sampling, they have it all there in the additional file.”

All research articles published in Genome Biology have strong data sharing associated with them. However, Senior Assistant Editor Elizabeth Gaskell explained that in the sequencing and analysis of an Irish human genome, not only have the authors provided some additional data with the publication, they have also uploaded their SNP and sequence data to the NCBI public databases, and then went beyond this and uploaded them to the community cloud resource (Galaxy) for ease of access and analysis by others.

For some articles we publish, such as Data Notes, a biomedical data set or database is at the heart of the publication. The Data Note from Vickers and Cronin, which makes available data from a clinical prostate cancer study in a readily reusable form, is an excellent example.

Any scientist will recognise that this is just a small sample of articles but more and more additional data are being published. BioMed Central’s full-text corpus continues to be available for data mining and, now, we are also happy to share on request a list of additional files published in our journals –  the files are currently published under a Creative Commons attribution licence. All we ask is that you attribute the original author and, if possible, report back on its uses.

Supplementary data files do not do away with the need for data archiving in specialized data repositories. Rather, they play a complementary role, especially for fields or for  types of data  where suitable data repositories have yet to emerge.  So, to paraphrase molecular microbiologist and blogger Thomas Joseph, “Please let’s not throw the open data baby out with the supplementary material bathwater.”

Iain Hrynaszkiewicz (Journal Publisher)
Matt Cockerill (Managing Director)

View the latest posts on the Research in progress blog homepage