GigaScience – a repository for large datasets

4

The recent explosion of genomics technology has revolutionized biology, but it is only really of use if people are able to analyze and use the resulting sequences. Storage of such vast quantities of data is problematic, as the ongoing uncertainty over the future of NCBI’s arm of the Sequence Read Archive shows (SRA). The BGI, in conjunction with BioMed Central, recently launched GigaScience, a journal aimed specifically at projects generating a lot of data, which can accommodate such large datasets alongside the articles describing them. GigaScience also anticipates becoming a repository for stand-alone datasets such as those resulting from genome sequencing projects. One such dataset has just been released, and it contains the assembled and annotated sequences of genomes from three strains of sorghum, a plant of huge economic importance in the developing world as a source of food, fodder, fuel and fiber. The article describing these data has been published in Genome Biology; the raw reads are available from the SRA, and the assembled reads from GigaScience. This is the first time that a genome dataset has been cited as a DoI in an article’s reference list, so is the first step in the process leading to researchers getting citation credits for the data they generate.

Andrew Cosgrove

Andrew obtained his PhD in molecular biology from the University of Dundee in 2005. He joined Genome Biology in 2009 after a post doctoral research position at the University of Sheffield investigating chromosome positioning during meiosis in yeast.
Andrew Cosgrove

View the latest posts on the On Biology homepage

4 Comments