In most genome sequencing applications, at a given locus there are at most two variants, present in equal proportions (ie the two alleles in a diploid individual). In situations like these, the high error rates of next generation sequencing technology are not a large problem, as any errors are likely to be present at much lower frequency than reads from the true variants, so are easy to filter out. There are some situations, however, where variants are present at low frequency in the sample. Examples would include sequencing a tumor, which is a collection of cells all of whose genomes may have diverged from the founder cell; sequencing mitochondrial genomes, as each cell contains many mitochondria, not all of which will have the same genome; or sequencing pooled genomes from a population of individuals in order to detect rare variants. In each of these cases, it is possible that true sequence variants will be present at a frequency similar to the frequency of sequencing error. In an article just published in Genome Biology, Mingkun Li and Mark Stoneking present a method that statistically analyzes detected variants in order to differentiate true variants from sequencing error. As well as being a very useful method, this publication is of note because the work described won the poster prize at last year’s Beyond the Genome conference, organized by Genome Biology and Genome Medicine in Washington DC last year. We hope that this year’s conference, in Boston in September, will result in similarly high quality submissions.
Andrew obtained his PhD in molecular biology from the University of Dundee in 2005. He joined Genome Biology in 2009 after a post doctoral research position at the University of Sheffield investigating chromosome positioning during meiosis in yeast.