‘Availability of supporting data’: crediting transparency and enhancing the literature

- 1 Comment

BioMed Central supports initiatives aimed at promoting transparency and reproducibility in research, and we strongly encourage data sharing and publication. Submission of a manuscript to any BioMed Central journal has always implied that “readily reproducible materials described in the manuscript, including all relevant raw data, will be freely available to any scientist wishing to use them for non-commercial purposes.” In some biomedical fields, such as genomics, ecology and evolutionary biology, support is established or growing for even stronger requirements for the availability of data supporting published articles. We have developed improved article features and resources to support these communities’ requirements, to better link journal articles to underlying data, and to facilitate academic credit for data sharing.

Availability of supporting data article section
The first enhancement is a new standard article section, ‘Availability of supporting data’, for relevant journals. This section will make it easier for authors to indicate where and how the data supporting the results of a research project can be found openly online.

Amongst the first journals to introduce this section are BMC Research Notes, and the recently announced ‘big data’ journal with integrated cloud-based repository, GigaScience, and Open Network Biology. Authors of original research-based articles should use the following format for the ‘Availability of supporting data’ section, when data are available in an open access repository elsewhere on the web:

Availability of supporting data
The data set(s) supporting the results of this article is(are) available in the [repository name] repository, [unique persistent identifier/link for dataset(s)].”

The following format is required when data are included as additional files:

Availability of supporting data
The data set(s) supporting the results of this article is(are) included within the article (and its additional file(s))”

We also recommend that the data set(s) be cited, where appropriate, in the manuscript, and included in the reference list.

In short, if a published article includes one of the statements above then the supporting data are freely and permanently available on the web. In recent years journals have been taking different approaches to encouraging data sharing. Data sharing can be implied, on request, as a condition of submission or publication (the traditional approach for BioMed Central journals); it can be required for editors and peer reviewers (such as happens at Nature); a statement as to the availability of data can be required (in Annals of Internal Medicine and the BMJ). Increasing transparency about the availability of data is a step in the right direction. But to increase much-needed academic credit for data sharing and publication and to enhance the functionality of the online literature by linking publications to data, we should move towards consistent citation and linking of publications to data.

Understandably, it is not always possible or appropriate to openly share data in some biomedical fields, so the ‘Availability of supporting data’ section is not required by all journals. The decision to mandate data deposition as a condition of publication is a decision best made by the scientific community a journal serves. The ‘Availability of supporting data‘ section is a tool for editors, authors and scientific communities to, at the appropriate time, put data deposition policies into practice.

For example, BMC Research Notes which serves a wide section of the life science community – biology and medicine – is amongst the first journals to introduce this section. BMC Research Notes authors submitting research, short reports, data notes and technical notes directly to the journal will be, from August 2011, encouraged to include the ‘Availability of supporting data’ section. GigaScience, a journal with strict requirements for data availability, requires all authors of research, data note and technical note articles to include ‘Availability of supporting data’. Digital Object Identifiers (DOIs) are assigned to deposits in the GigaScience database to ensure online permanence. Open Network Biology’s novel, network model article type will also be permanently linked to data sets hosted in a repository.

Information on the ‘Availability of supporting data’ section and policy, and journals that have begun encouraging or requiring its use can be found within BioMed Central’s policy pages. Over the coming months more BioMed Central journals which implement this new section will be added to this list.

But where can I put my data if not in additional files?
Additional files (supplementary materials) are a viable option for certain types of small-scale data files and BioMed Central authors are encouraged to include data as additional files where possible. But for large datasets and to meet data availability policies of some communities and institutions, data need to be hosted in a repository. We are therefore keen to provide our authors with as much information as possible on where they can deposit their data, so it can potentially be linked to their publication(s). Well established and widely supported databases exist for certain types of data such as nucleic acid sequences, protein sequences, and atomic coordinates, and these are already included in relevant journals’ instructions for authors. But there are many other repositories, which could potentially link data to our publications. Therefore, to complement  the development of this new article section, we have been collaborating with DataCite, the British Library and the Digital Curation Centre to develop and maintain a list of domain and institution-specific repositories – repositories which accept a variety of data file types and assign a variety of unique, permanent identifiers for deposited data. This list is available on the DataCite website and is linked from the instructions for authors of the relevant journals. Community participation in developing this resource is strongly encouraged. Please contact DataCite to suggest changes and additions to the repository list.

And what is the right format for my data?
To help maximize potential for data reuse and increase the efficiency of science, shared data should be made available in formats that are widely agreed by the relevant scientific field – data standards. BMC Research Notes launched an initiative in 2010 to promote the awareness and use of data standards, and subsequently partnered with the BioSharing network to achieve this shared goal. Journals that introduce the ‘Availability of supporting data’ section’ encourage authors to comply with available field-specific standards for the preparation and recording of data. We recommend authors check the BioSharing website for information on best practice in their field for sharing of data, with particular attention to maintaining patient confidentiality.

  • http://fged.org fged society

    For anyone running into roadblocks when attempting to obtain supporting functional genomics data sets from a published article, the FGED Society will assist you in working with the original authors and journal to obtain them.

    See our “Facilitating Data Deposition” service here.

    Sincerely,
    Steve Chervitz, on behalf of the FGED Society — promoting data sharing best practices since 1999: ShareYourData.org