Source Code for Biology and Medicine constitutes an ideal platform for dissemination of code developed for the biomedical fields. One particular field in which the development of source code is particularly critical at the moment is personal genomics. We live in a moment in which the accessibility of Direct-to-consumer (DTC) genetic testing services is not being matched by our ability to interpret results. Sequencing technologies have advanced so much that it has become almost trivial to sequence a personal genome. What is not trivial though is its interpretation, and more specifically, determining the phenotypic associations that any given marker can produce in an individual. This is a big problem because there are not enough data samples freely available that can provide sufficient power for statistical inference. The regulatory/bureaucratic system currently in place makes it difficult for any scientist or hobbyist alike to access any personal genome data. While it is an individual’s right to be able to keep his/her genetic information private, as a consequence of the current general attitudes, data accessibility restrictions are massive. Hence the development of open source tools for the analysis of personal genome data has been severely affected. Hardly any funds have been dedicated to the development of free tools, and the few free tools currently available, such as SNPedia or openSNP, are maintained by the heroic effort of dedicated colleagues who give away their work.
Against this backdrop, a year ago my family and I decided to publish our million SNP markers genotyped by a DTC company. As soon as these data were made accessible, we received messages from scientists and hobbyists around the world who used these data to compare against their own or simply to develop new tools based on this family model. We gained so many insights from this experience that we were even able to find annotations for a couple of OMIM genes not reported by the DTC company (data not shown). What made our data attractive was the fact that it belonged to a whole family of 5 blood-related individuals and that we released it under a public domain license. Data for a whole family provided advantages that were not available when single individuals are studied in isolation. For example, we were able to calculate calling error rates or Mendelian Inheritance Errors. In this process we were also able to publish two papers (here and here) so far using our family genotypes as a model.
Now that we have done a lot of work with genotypes, we thought that the time is ripe for embarking into the sequencing of our whole genomes. Because the money to do the sequencing is beyond our means (20,000 USD for 5 whole genomes, according to a quote by the BGI), we have launched a crowdfunding campaign to gather funds, thus we called this project the Crowdfunding Genome Project. According to a search in Google (as of 27-06-2012), we have not found any other crowdfunding initiative so far applied to the genomics field.
We will do as much sequencing as the money raised allows. We pledge to make all our genomic data freely available and reusable in the public domain under a Creative Commons CC0 waiver and report all our experiences in a variety of publications. Indeed we will use BioMed Central’s Source Code for Biology and Medicine as one of our preferred journals for dissemination of our results. So far, in a week since we started this fundraising campaign, we have been pledged 1,598 USD by donors.
There are similar well-known projects like the Personal Genome Project (PGP) that share values like ours. We are different, however in that we are concerned not only about sharing the data as the PGP does, but also in building an open community of personal genome developers and hobbyists. The PGP focuses on single individuals whereas we provide the data for a whole family. We believe that the accessibility of whole genome analyses for the general public is now about to reach the tipping point where its adoption is going to become widespread, at least in a clinical setting. Before private interests end up restricting the access and ability for ordinary consumers to be able to interpret their own personal genomes, we want to stimulate the development of a thriving open source community that sustains an alternative to private efforts. Based on our family model, we will be able to provide an initial benchmark on how much it can be discovered using free software tools and no laboratory equipment, something that in theory is within the reach of any ordinary (informed) citizen.
Manuel Corpas (The Genome Analysis Centre)