Working at a data center as big as the BGI, and launching a journal focused on maximizing data use, we have followed the many calls over the years for improved incentives and methods to aid and reward scientists in spending the considerable time and effort to share their data. The Datacite consortium has been key in producing infrastructure to finally allow this, and to that note we have been working with Datacite and the British Library to assign their DOIs to supplementary data associated with the journal, as well as datasets produced by the BGI.

This week we attended Data and the Scholarly Record: The Changing Landscape, the Datacite 2011 summer meeting in Berkeley [program here]. A lot has happened recently, and the meeting was a great opportunity for technical updates on their new metadata resources, as well as receiving a wide range of perspectives on where the project is headed. It was great to meet such a wide range of attendees, as there was a wide range of different stakeholders: Datacite members, data Librarians, bibliometricians, repository managers, publishers, funders, and scientists and data producers spanning a wide range of fields, including astronomy (NASA), neuroscience, ecology and environmental sciences.

Having such a savvy bunch of individuals meant there was good coverage on twitter, and there are also several summaries already posted (this from @_inundata, two from @cboettig and a storify summary from @mrgunn). It was great meeting these and other tweeps. Not wanting to cover too much of the same ground, and slides are supposed to uploaded shortly, but there was a lot of great stuff presented and it would be good to draw attention to the highlights. John Wilbanks did an excellent keynote, not so much focused on his work with Creative Commons and data licensing, but reminding everyone about the rationale and need for data citation, as well providing advice regarding potential pitfalls that need to be avoided. Slides are here, and audio will be uploaded shortly. Heather Piwowar similarly highlighted the potential obstacles and issues needing to be overcome with her talk: Tracking Data Reuse: Motivations, Methods and Obstacles. 

There was a nice juxtaposition of Jason Priem talking about altmetrics in the same session as the representative from Thomson-Reuters. Both alternative and current metrics systems have common ground here, with data citation complementing the current suite of altmetrics, as well as being a potentially lucrative new product for Thomson-Reuters. Jason’s slides are here, and his talk should be complemented for coining the term “citwations”, as well as novel use of Chekov quotes. Jason discussed the “decoupled journal“, and one of the recurring themes of the meeting was the pressing need for data journals. This makes the launch of GigaScience and our recent experiments publishing datasets such as the E. coli genome perfectly timed, and our talk in the Pecha Kucha Rapid Fire updates session seemed to be well received (slides here).

One particular issue taken from the meeting raised by Sarah Callahan in the data producers session, was that whilst we now we finally have these tools and infrastructure available, people need to use them. Whilst we now have a carrot, people may still need to be taught to appreciate the carrot, and that is now an important issue to address. Be that making clear guidelines on how to cite data, encouraging authors to include it on their CVs (currently done by ORNL DAAC), collecting this information in ORCID, and making sure funders ask for this as well. We at GigaScience will do our bit to promote and help develop best practices here, and we hope others can help in this world of new carrots.

