“We have way more data than we can possibly analyse”, said Arthritis Research & Therapy Editorial Board member Prof Edward Wakeland, in concluding his keynote address to the 9th International Congress on Systemic Lupus Erythematosus in Vancouver.
The Wakeland Laboratory are generating data that should help elucidate the genetic basis for human susceptibility to lupus. They aim to sequence the genome of 600 patients with systemic lupus erythematosus (SLE) – a disease thought to affect more than 250,000 US citizens – in the next 16 months. They have sequenced 107 patients so far and are on track to achieve their goal, facilitated by modern ‘deep sequencing’ technology. Control data for the project are being provided by the 1000 Genomes Project.
But generating terabytes of sequence data is just the beginning. Prof Wakeland explained that there are 89 genomic segments of potential interest but this is more than one lab is able to analyse, and on the second day of the congress he offered the data out to the community to collaborate with his group.
The web-based interface by which the Wakeland lab will make its data available to collaborators is not yet completed, but the proposal seems to be another example of genomic researchers being ahead of the curve in scientific data sharing.
Indeed, collaboration on – and driving – future research is one of the many benefits of openly sharing research data, and the potential for collaboration with large genomic datasets is vast. And where researchers must publish or perish, offering co-authorship on articles to collaborators might seem logical. However, by contributing data alone researchers will not meet the authorship criteria of the ICMJE, which are endorsed by many journals including BioMed Central’s. Moreover, researchers might not want to endorse the findings of every article resulting from their data.
So is it time for a rethink? Authorship criteria offer protection as well as a means of giving academic credit, so ambitious projects such as the Wakeland group’s instead add to the urgent need for data sharing to be recognised by academic institutions and the broader scientific community, particularly as plans for sharing data are increasingly a requirement of research funding agencies, such as the NIH and Wellcome Trust.
Prof Wakeland explained that his group have had a policy for sharing data from the outset. “All of the sequencing data was generated in my laboratory, using samples predominantly obtained from the OMRF. The size of the data set is mammoth already and we are unable to analyze all of the gene segments that are available,” he said.