On its 10th anniversary Ensembl publishes a thematic series with the BMC-series

1

The last 15 years has seen an explosion in genomic
research and sequenced genomes. With the build up to sequencing larger chordate
genomes it became very clear that manually annotating the billion base pairs of
sequence produced was not practical and automated annotation systems wer
e
required. Several large organisations have helped address this issue, but the
Ensembl project, a joint venture between
the European Bioinformatics Institute and the Wellcome Trust Sanger Institute,
has in particular provided high-quality integrated annotation on vertebrate
genomes within a consistent and open source infrastructure. This year marks the
10th anniversary of the Ensembl project’s launch, and BioMed
Central is today publishing a thematic series of articles describing the
construction, content and current use of Ensembl’s resources.

The first six articles published today in BMC Bioinformatics and BMC
Genomics
,
co-ordinated by Paul Flicek at Ensembl and the European
Bioinformatics Institute, reveal in detail how many of the comparative
genomics, variation and regulatory data resources have been constructed. The
first article describes
the comprehensive web-based functions available for tabulating and visualizing
genome variants
. A second
related article
discusses the database and software library supporting
the integration of variation data into the existing Ensembl resources
.

To be able to keep up with the ever
increasing number of genomes reported (51 in the last release),
Ensembl has had to use automated workflow systems. Jessica Severin and colleagues present an artificial intelligence pipeline
‘eHIVE’, based on a self-organizing workflow system akin to the behavior of
honey bees, to provide updates to its comparative genomics resources. Benoît Ballester and colleagues also demonstrate how the
Ensembl microarray annotation protocol handles the release of the latest
commercial arrays. To keep a pace with both the
increasing demands of users and the terabytes of data now available from the
website, Anne Parker and colleagues show how they use
caching and optimization techniques alongside Web 2.0 technologies to improve
the performance of the Ensembl website.

A final article by Giulietta Spudich and Xosé
Fernández-Suárez uses several examples to offer a
practical guide for using Ensembl to learn about genomic annotations in regions
of interest.

While most of this detailed “behind the
scenes” information will not significantly alter the way users access genomic data, it
guides molecular biologists to the full range of tools available to them. It
will also be of great value to researchers building other bioinformatics
applications, and demonstrates how Ensembl is constantly adapting and updating
their tools to be able to prepare for its next decade.

Scott Edmunds

Senior Scientific Editor, BMC Series
journals

Scott Edmunds

Executive Editor of GigaScience, and data nerd working at the BGI and based in Hong Kong. Open Knowledge Open Science Ambassador and Executive Committee member for Open Data Hong Kong.

View the latest posts on the On Biology homepage

One Comment

Comments are closed.