The last 15 years has seen an explosion in genomic
research and sequenced genomes. With the build up to sequencing larger chordate
genomes it became very clear that manually annotating the billion base pairs of
sequence produced was not practical and automated annotation systems were
required. Several large organisations have helped address this issue, but the
Ensembl project, a joint venture between
the European Bioinformatics Institute and the Wellcome Trust Sanger Institute,
has in particular provided high-quality integrated annotation on vertebrate
genomes within a consistent and open source infrastructure. This year marks the
10th anniversary of the Ensembl project’s launch, and BioMed
Central is today publishing a thematic series of articles describing the
construction, content and current use of Ensembl’s resources.
The first six articles published today in BMC Bioinformatics and BMC
Genomics, co-ordinated by Paul Flicek at Ensembl and the European
Bioinformatics Institute, reveal in detail how many of the comparative
genomics, variation and regulatory data resources have been constructed. The
first article describes
the comprehensive web-based functions available for tabulating and visualizing
genome variants. A second
related article discusses the database and software library supporting
the integration of variation data into the existing Ensembl resources.
To be able to keep up with the ever
increasing number of genomes reported (51 in the last release),
Ensembl has had to use automated workflow systems. Jessica Severin and colleagues present an artificial intelligence pipeline
‘eHIVE’, based on a self-organizing workflow system akin to the behavior of
honey bees, to provide updates to its comparative genomics resources. Benoît Ballester and colleagues also demonstrate how the
Ensembl microarray annotation protocol handles the release of the latest
commercial arrays. To keep a pace with both the
increasing demands of users and the terabytes of data now available from the
website, Anne Parker and colleagues show how they use
caching and optimization techniques alongside Web 2.0 technologies to improve
the performance of the Ensembl website.
A final article by Giulietta Spudich and Xosé
Fernández-Suárez uses several examples to offer a
practical guide for using Ensembl to learn about genomic annotations in regions
of interest.
While most of this detailed “behind the
scenes” information will not significantly alter the way users access genomic data, it
guides molecular biologists to the full range of tools available to them. It
will also be of great value to researchers building other bioinformatics
applications, and demonstrates how Ensembl is constantly adapting and updating
their tools to be able to prepare for its next decade.
Scott Edmunds
Senior Scientific Editor, BMC Series
journals
Scott Edmunds
Latest posts by Scott Edmunds (see all)
- Help investigate the genetic mystery of Hong Kong’s floral emblem: The Bauhinia Flower - 24th November 2015
- MIQE precis: with reference to reference genes - 21st September 2010
- On its 10th anniversary Ensembl publishes a thematic series with the BMC-series - 11th May 2010
Perhaps rooster comb knee injections could be a more efficient and effective alternative to “dry cupping”.