Bioinformaticians breaking down barriers in Berlin

- 1 Comment

Open Science flourishes at BOSC and ISMB
It’s been a busy couple of weeks for GigaScience, with our 1st birthday, publication of a special anniversary print issue sponsored by Aspera, and publication of the (unusually reviewed) Assemblathon2 paper. These all spanned and were coordinated with the ISMB meeting in Berlin, the yearly gathering of the computational biology community. GigaScience was present through the pre-conference SIGs (particularly BOSC and AFP), as well the main conference, co-organizing and speaking at a special “What bioinformaticians need to know about digital publishing beyond the pdf” workshop.

We covered BOSC (the Bioinformatics Open Source Conference) last year (see here), but one new string to its open source bow was the addition of new session topics on Open Science and Reproducible Research. SIG chair Nomi Harris announced at the meeting opening that BOSC were even thinking of swapping the open science and open source in their name, and this was a good introduction to the opening keynote from Cameron Neylon on “Network ready research” (see slides, link bundles and video). Being at an “intelligent systems for molecular biology” meeting, networks were not just used in systems biology sense, but covering social and research networks, the talk did a great job touching on the cultural issues holding back open science, back such as the need to hack incentive systems. As ISMB audiences are not afraid of equations, the Neylon Equation (as later plugged by Carole Goble in her related ISMB keynote) that maximizing probability your work can help someone = interest/friction multiplied by the number of people you can reach, was a great analogy to demonstrate the benefits of openness as a “lubricant” that minimizes the friction of collaboration.

Despite doing a talk billed as being on sequence analysis, Sean Eddy’s keynote also followed similar themes of collaboration and the philosophy and process of science (slides and video). Covering publishing history since the 17th century (‘”open data” has been the scientific community standard since 1665′), the need for scientists to think more like engineers (incremental advances ARE important), and the “insane” level of commitment required to build a world class tool, it was a fascinating insight into his career, and particularly inspiring for the under appreciated tool developers in the audience. As with last year there were a number of talks on Galaxy from familiar faces from the previous fortnights Galaxy Community Conference in Oslo such as Enis Afgan, John Chilton and Clare Sloggett (see the GCC2013 write-up), as well a nice talk on Research Objects from Stian Soiland-Reyes that set up things nicely for related work in our workshop.

‘Offene Wissenschaft’ in Berlin
The utilizing the wisdom of the crowds themes were not just limited to BOSC, and in the AFP (Automatic Function Prediction) SIG there were a number of talks on Critical Assessment challenges, with Anna Tramontano giving a 20-year overview of CASP (Critical Assessment of protein Structure Prediction), and announcements on CAFA2 – a new Critical Assessment of Function Annotations challenge. Following a similar approach of our Open Ash Dieback paper, there was even an announcement by Markk Wass of crowdAFP: a crowd sourced effort to functional annotate the CHO (Chinese Hamster Ovary) genome, the most important cell line used in the production of therapeutic proteins and antibodies such as Herceptin. We are currently hosting the CHO genome in our GigaDB database, and from participating in the E. coli genome crowdsourcing (see GigaBlog), this is an area we are particularly interested in. Any interested participants should take part in this online survey). We have announced a series of papers coming from the AFP meeting, so watch this space for more news.

In the main ISMB meeting, the “community” theme continued, with an interesting workshop on bioinformatics networks – again focusing on networks of people rather than cellular or molecular ones. It was fascinating to get the insight into how Galaxy do this, with Jeremy Goecks explaining that 40% of their budget is spent on outreach, and helping to explain how they are growing at a rate of over a 1000 new users a month. We will cover our workshop on digital publishing and its related tech track talk from Marco Roos in more detail in a later posting, but this set things up nicely for Carole Goble’s keynote on “results may vary: reproducibility, open science, and all that jazz” (slides here). Being a great board member, GigaScience got a shoutout as a member of the “republic of science” roll call, and the talk highlighted some of our ongoing reproducibility case study using research objects, nanopublications and ISA-TAB. The talks was so rich with quotable quotes it was difficult to keep up, although a favorite had to be her mention of the CRAPL software license that releases code openly under the one provision that users will not laugh at it.

ISMB has a reputation of being an early and enthusiastic embracer of social media (this 2009 Nature News piece on the conference contains the classic Jonathan Eisen admission that “’twittering’ about a presentation he is listening to helps him to focus.”). If you did not attend then there are useful storifies from Peter Cock and others archiving the tweets, blog summaries from Brad Chapman, and there are also videos and slides available from the BOSC2013 website.