Publishing better science through better data

Last Wednesday, I attended a conference called Publishing Better Science Through Better Data at the Wellcome Collection, organized by Scientific Data. Genome Biology has always been at the forefront of making data open, and we were insisting on data being openly available before most other journals were, so it was interesting to see how the state of the field is progressing. The meeting was well attended, and approximately half of the attendees came from the biological sciences.

The day started with talks from Florian Markowetz from Cambridge University on why we should work reproducibly, and Jenny Molloy, also from Cambridge, on why we should work openly. Both speakers made a point which was reiterated by others throughout the day: apart from it being better for everyone if you do this, it’s also better for you. It may seem like being open and reproducible is a large investment in time, but after an initial outlay of effort, it can save you time in the long run. It makes it easier for you to go back to your earlier results and know what you did, and spot mistakes, thus potentially insulating you from embarrassing retractions. If it’s simple for others to use your data, it’s going to be better for you too. It is good for getting published, because it’s easier to convince reviewers that you are right. So as well as the altruistic reason of it benefiting everyone, there are also lots of selfish reasons for doing this, which seems the best way of persuading other people. For further information about Markowetz’s talk, you can read the comment article he published with us last year.

There then followed a series of lightning talks from researchers about what they are doing with open data, and the tools they are using to help them. These covered fields including ecology, genomics, energy policy and materials science. The talk that perhaps generated the most interest was from Guy Rouleau, describing the efforts of the Montreal Neurological Institute at McGill University to become the first large department to become completely open (as far as he is aware). This entails all patients being asked to sign on admission consent forms for data release, a biobank for samples available to researchers on request, and the department not pursuing intellectual property rights on any discoveries made. This move was voted for unanimously by the faculty, and will be monitored for effectiveness as it goes along.

After the lightning talks, the keynotes resumed with Kevin Ashley of Edinburgh University telling us about how to manage data. At Genome Biology, we’re used to thinking about data sharing from a genomics point of view, but Ashley’s talk gave some interesting examples from other fields. For instance, we usually think of reasons why data may not be made public in terms of patient confidentiality, but there are other good reasons for keeping data private. The example given was the locations of specimens of rare plants, where it would not be advisable to make public the precise coordinates. Ashley pointed out, however, that even in situations where the data itself could not be made public, it is important that the existence of the data is openly available. Ashley also mentioned that astronomers have been using open data for hundreds of years, incorporating observations from earlier astronomers into their work. Even now, star charts from the 8th century can be used to provide information on how the Earth’s orbit has changed over time. Even when the conclusions based on a dataset have been superseded, or shown to be false (early astronomical observations were largely performed in support of astrology), the data itself is still useful.

Andrew Hufton, the next speaker, took this concept further and suggested that maybe we should separate the publication of the data from the analysis. Hufton is managing editor of Scientific Data, and was talking about the publication of data descriptors, which allows researchers to publish the description of datasets, and helps those researchers gain recognition for producing the data by providing a citable object associated with the data.

The day finished with a panel discussion chaired by Dorothy Bishop of Oxford University, featuring Jon Fistein from the Medical Research Council, Alison Mitchell from Vitae, Emma Ganley from PLOS Biology and Steve Lewandowsky from the University of Bristol, who answered questions from the audience and gave their views on the benefits and potential problems of transparency in research. Some of the pitfalls discussed included everyone being overwhelmed by the data, and the possibility of nefarious pressure groups cherry-picking data in a misleading way, although it was generally agreed that this last could be countered and was not sufficient reason to not be open.

As well as live tweeting from the meeting (#scidata16), there was live cartooning. Royston Robertson from Ludic Creatives produced this fantastic picture. He really captured all the speakers’ main points, so if you want more information on the talks, that’s a great place to start.

Last year in Genome Biology, Mick Watson asked when ‘open science’ will stop being the unusual case and become standard practice and just be ‘science’. The overall opinion from this highly stimulating meeting was that we may be on the cusp of that change right now.

Andrew Cosgrove

Andrew obtained his PhD in molecular biology from the University of Dundee in 2005. He joined Genome Biology in 2009 after a post doctoral research position at the University of Sheffield investigating chromosome positioning during meiosis in yeast.
Andrew Cosgrove

View the latest posts on the On Biology homepage