Social coding and scholarly communication – open for collaboration

- 1 Comment

GitHub Octocat Innovation in how social online tools and their features develop is frequently defined and driven by the network’s users. A collaboration between BioMed Central, some of our authors and editors, and the team behind a powerful social software development platform aims to stimulate innovation in scholarly communication.

The ‘social coding’ website, GitHub, was founded in 2008 and its primary aim is to enable users to publicly or privately share source code, and manage software development projects. But it seems that life scientists have had other ideas for quite some time.

Bioinformaticians – one of BioMed Central’s earliest and largest author groups – by definition must create and share software for life science projects. Many BioMed Central journals urge authors to publicly share their code and our authors tell us that GitHub is where they go to share that code.

BioMed Central and open access were born out of the web, and the web continues to offer ways to increase efficiency in science. Through technological collaborations, such as with LabArchives, we’re interested in finding ways of better joining up the the process of doing science with communicating that science.

Therefore, we’ve got together with a group of BioMed Central authors and editors, and the GitHub:Training team to document collaboratively, define and share good practices of how they are using GitHub for science. Until now the GitHub:Training team have had little direct experience of scientists’ uses of their services, and scientists have had little opportunity to share their optimal use cases directly with the GitHub team.

Scientists regularly use GitHub to publicly share their code and papers, author documents collaboratively, and version control their work, often pre-publication. Version control – as Karthik Ram and C. Titus Brown discuss in their blog accompanying this announcement and Karthik details in his article in Source Code for Biology and Medicine – is a powerful example.

Creative uses of GitHub have included the phenomenal crowd-sourced genetic sequencing effort of the 2011 E.coli outbreak, and more recently the open-source fight against the devastating Ash Dieback disease in European Ash trees. By working together, we hope to leverage this creativity to promote best practice and, ultimately, more efficient and reproducible science.

Journals offer important services including coordination of peer review (validation, quality control), creation of authoritative versions of record, and dissemination and promotion of community standards and good practices. These are exactly the services we and the editors of Source Code for Biology and Medicine are excited to be offering this initiative.

Members of the collaborative group will blog here and elsewhere periodically about the project’s outputs, some of which will undoubtedly be defined as we move forward. Please contact BioMed Central to help define more uses of GitHub for science.

  • http://twitter.com/caseybergman Casey Bergman

    This sounds like a great initiative, since it is clear that use of github is on the rise by life scientists as I document here: http://caseybergman.wordpress.com/2012/07/15/where-do-bioinformaticians-host-their-code/

    I for one would like to BMC go beyond documenting use cases and address the issue of the preservation of code associated with publications in BMC papers. As Karthik and Titus mention in their blog post, github is not an archive and code/manuscripts/data hosted on github can easily be deleted, as I discuss here: http://caseybergman.wordpress.com/2012/11/08/on-the-preservation-of-published-bioinformatics-code-on-github/

    What I would like to see from the BMC + github partnership is the implementation of a mechanism(s) to ensure published repositories are archived, as BMC has done for many years with URLs using webcite (which I know see is apparently in need of additional funds to continue operation).  One option would be for BMC to create a github organization, and to fork all published repositories, as has been done for the journal Computers & Geosciences (http://www.iamg.org/documents/oldftp/VOL38/instructions-github.pdf) and I am currently doing for all journals as a stop gap measure until this issue is sorted out. Another option, would be for BMC to instantiate an enterprise version of github and host (forked) published repositories, such as the University of Ghent is doing (https://github.ugent.be/ComputationalBiology/essaMEM). A third option would be for BMC to provide urls to github of all published repositorie, and for github to make these “non-deletable” (github could also do this automatically via PMC/Pubmed search).  All of these solutions have pros and cons, relating to the whether a version of record or a living repository is desired.

    Regardless of the solution, I welcome BMC taking steps to consider how github is being used by scientists and hope that this initiative paves the way to sorting out what I see is a relatively important issue regarding the widespread use of github in the publish scientific literature