Guidelines on clinical data sharing developed by the Scientific Data team have been published in Research Integrity and Peer Review, and several published Data Descriptors at Scientific Data demonstrate the guidelines in practice. Our aim is to implement the best features of journals, data repositories and secure data request services to enable effective sharing of experimental clinical research data.
We are interested in exploring pragmatic approaches to publishing articles about, and linked to, clinical or sensitive data as we recognise some data cannot be safely shared openly or might be too burdensome to fully anonymise (de-identify). However, we believe many of these datasets would benefit from the peer review and discoverability provided by publications in scholarly journals.
The development of these guidelines began in late 2014, when Scientific Data gathered stakeholders from Pharma, academia, research funding, publishing and research data management to discuss disclosure of clinical data in journals. We released a preprint in June 2015 and after responses received as part of a public consultation released a revised draft in early 2016.
Guidelines are only useful if they are put into practice and the benefits of sharing clinical research data need to be more comprehensively demonstrated to researchers. By working with interested repositories and early-adopter researchers, we are beginning to build a collection of examples of peer-reviewed articles linked to clinical data that are available with legitimate restrictions, summarised below. Scientific Data has also published in 2016 a meta-analysis of patient level data from eight prostate cancer clinical trials, where the data were obtained from a data request service, Project Data Sphere.
|Repository and data summary and access requirements
|Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures
Published July 2015
|Neuroimaging, behaviour, cognitive, and personality data for over 1,500 human participants are available from the project specific Laboratory of Neuroimaging (LONI) portal and through the Harvard Dataverse Network. Prospective users need to register with the repository and request data access.
|A structural and functional magnetic resonance imaging dataset of brain tumour patients
Published February 2016
|Brain scan data and related clinical information are hosted in the UKDA’s Reshare archive. Prospective users must request access via the authors of the manuscript, in a process mediated by the UKDA.
|An open access pilot freely sharing cancer genomic data from participants in Texas
Published February 2016
|Much of the data are provided in a completely open manner, following a rigorous informed consent process, via the NCBI SRA sequence repository. The full dataset, with more detailed tumour and patient information, is provided by the Texas Cancer Research Biobank website, which mediates access via a data use agreement.
|The mPower study, Parkinson disease mobile data collected using ResearchKit
Published March 2016
|Data are hosted in the Synapse repository. Prospective users need to provide an intended use statement when applying for data access and. Synapse includes some novel features that support sharing of sensitive datasets, such as a quiz embedded in the application process that certifies users on proper use of the system.
|Reference genotype and exome data from an Australian Aboriginal population for health-based research
Published April 2016
|The data are hosted in the European Genome-phenome Archive (EGA), and are available only for use in research with clear relevance to Aboriginal health. Access to the data is controlled by an institutional Data Access Committee, with access being provided to approved applicants by EGA.
How peer-reviewed journals manage access to and policies on, clinical data sharing is become a more pressing issue, particularly since the International Committee of Medical Journal Editors (ICMJE) announced a draft policy for consultation that would require authors to share anonymised individual patient data (IPD) within 6 months of publication.
Our experience at Scientific Data suggests this is possible, but journals, repositories, researchers and their sponsors need to work together to ensure data sharing happens in a way that is valuable to all stakeholders. We hope these guidelines and case studies will help enable this.