Citing and linking data to publications: more journals, more examples…more impact?

Since BioMed Central introduced additional data sharing resources for authors and editors last year, there have been a number of further developments in the field that have necessitated an update to our supporting data information.

Eight further journals, including Retrovirology, Cell & Bioscience, and Frontiers in Zoology have introduced the ‘Availability of supporting data’ section to either encourage or require all authors to consistently link their supporting data to their publication, or clearly indicate supporting data are included within the article and its additional files. As articles submitted after the introduction of these policies have begun to be published we now have a growing number of examples, from a variety of biomedical domains.

In BMC Research Notes, which was amongst the first journals to introduce this article section, Schulz et al. have included their programming script within the additional files of their article, which describes a software tool for automated assessment of cardiopulmonary resuscitation skills.

Anderson and Elizur, in their study of hepatic reference genes in female Atlantic salmon also in BMC Research Notes, have deposited all their supporting data in the PANGAEA repository for adult and juvenile samples they collected. PANGAEA specializes in publishing geo-referenced data for earth and environmental sciences and helps to ensure permanence and citation of data by assigning digital object identifiers (DOIs) issued by DataCite.

It’s particularly pertinent to see links to PANGAEA from BMC Research Notes, having just returned from the EuroMarine workshop on Scientific Data Integration in Bremen, which focused on linking scientific data to journal publications. At the workshop session chair Dr. Michael Diepenbroek, who heads-up PANGAEA’s systems development, alerted attendees, which included publishers, editors, researchers and software developers, to a new study of the impact of sharing data underlying publications.

The study – an abstract presented at the American Geophysical Union 2011 meeting –  reported a 35% increase in citations to articles published in the journal Paleoceanography, when supporting data were freely available. Of 1,331 articles sampled over the 18-year study period, the 171 articles with publicly-available data received nearly 20% (8,056) of the aggregate citations.

Similarly, a study deposited in the ArXiv pre-print repository in November 2011 and distributed on Connotea also found citation rates in the astronomy field were higher for articles with links to supporting data.

These studies are, of course, limited to specific fields or journals – and those yet to be published in journals will likely be subject to further peer review – but providing evidence of the benefits of data sharing for individual researchers and research groups is undoubtedly important. We already know that sharing detailed microarray data is associated with increased citations to the papers reporting the results and that there are many benefits of data sharing for society as a whole but a common barrier to data sharing is lack of credit and incentives for individuals. The possibility of increased research impact may provide further motivation to those producing but not necessarily reusing data. Another desirable development is for citations to datasets assigned DOIs or equivalent persistent identifiers to contribute to measures of researcher impact, as is established for citations to journal articles and measured by a number of common tools, such as Web of Science.

As well increasing links between articles and data, another aim of the ‘Availability of supporting data’ section is help address this issue – to increase academic credit for data sharing by encouraging data citation. This month we have made data citation even more strongly encouraged with an update to BioMed Central’s reference style guide, found in any journal’s instructions for authors. It now explicitly mentions datasets and provides an example of a dataset citation.

“Only articles, datasets and abstracts that have been published or are in press, or are available through public e-print/preprint servers, may be cited

Dataset with persistent identifier
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience."

Data citation is recommended according to the standards proposed by DataCite, where persistent identifiers are displayed as linkable, permanent URLs. Finally, the ‘Availability of supporting data’ resources page has been updated with more information on citing and linking to data, in particular a link to a comprehensive guide from the Digital Curation Centre.

View the latest posts on the Research in progress blog homepage