Beyond the genome and into the cloud



Next generation sequencing technologies have changed forever the genetics and medical genomics landscape. Advances have been made that were unthinkable just a few years back, but the flip side to these feats is that more data are generated than most regular labs are able to analyze and work with. As the aim of many large-scale genomics projects is to disseminate findings and encourage meta analysis, the conundrum is how to make these data sets accessible. Cloud computing, best described as computation-as-a-service, provides a solution and allows a user to rent hardware and storage for however long they need it, scaling up if and when required.

Following the inaugural Genome Biology conference, ‘Beyond the Genome’ at Harvard Medical School (11th-13th October 2010), Genome Biology held a workshop on ‘Cloud computing in genomics and bioinformatics’. This workshop was chaired by Folker Meyer (Argonne National Lab) and Vivien Bonazzi (Genome Informatics and Computational Biology program, NHGRI). The advantages of using commercial clouds were discussed by Narayan Desai (Argonne National Laboratory) and Chris Dagdigian (Bioteam). Desai stated that in the future, biology was going to be “analysis-limited” and the increased computing power provided by cloud resources was a way to solve this issue, especially as they are already cheap and will soon be “too cheap to ignore”. Dagdigan concurred, highlighting that cloud computing enabled large-scale analyses from your lap top “without contacting your IT department”. Bob Grossmann (University of Illinois at Chicago) spoke on behalf of the Open Cloud Consortium describing the Bionimbus community cloud.

Michael Schatz (CSHL), James Taylor (Emory University) and Lincoln Stein (Ontario Institute for Cancer Research) described real world examples of utilizing the cloud for bio-computing. Schatz described assembling genomes in the cloud and Taylor gave participants a virtual tour of the Galaxy platform. Stein talked through some examples of using Bionimbus for modEncode data analysis.

Three parallel, hands-on sessions, run by Schatz, Desai, Florian Fricke (University of Maryland School of Medicine), Sam Angiuoli (University of Maryland School of Medicine), Nigel Cook (Knight lab, University of Colorado) and Titus Brown (Michigan State University) guided  conference participants through cloud-based analyses, many of them for the first time. Prepaid cloud credits were provided by the sponsor, Amazon.

Major points for future consideration that emerged were: cost, security and data transfer. Cloud-based analyses do incur a cost. However, as costs for increased infrastructure,  cooling or IT support are often not considered, local analyses may not actually be as cost efficient as cloud computing. The continuing falling cost of cloud computing will  make it an increasingly viable and attractive alternative. One significant hurdle that may slow progress is data transfer; large amounts of data cannot be transferred over the internet and shipping hard drives is impractical. Issues about privacy are also unresolved: How can personal genomics data remain confidential? These are some of the challenges that face the community now.

Biomed Central has published a special series of articles related to cloud computing for the biological sciences. You can also read a meeting report from the Beyond the Genome conference in Genome Biology.

View the latest posts on the On Biology homepage