In our genomes, there is a whole host of genes hiding in plain sight. These genes are not included in major genome annotation efforts and are widely ignored in the literature, even though in some cases they have been conserved for as long as 550 million years.
So how have these genes remained hidden? There is a short answer to this. Literally so: the genes are short.
Scientists and computer algorithms that hunt for genes expect their prey to take the form of long sequences of hundreds of nucleotides, and quite simply ignore or discard candidates that do not meet this criterion.
But they are perhaps unwise to do so, suggest a number of recent reports, including an article in BMC Biology that examines the significance of these short genes in the eukaryotic parasite Trypanosoma brucei.
Tailor-made tools for parasitism
The adaptive pressures on a parasite are very specific to parasitism: transmission, host colonization and immune system evasion. Parasite genomes largely reflect this, with each stage of the life cycle finely tuned to maximize the parasite's propagation in its vector and host of choice.
As a result, genes with novel functions, and hence minimal homology to known genes, arise in large numbers during parasite evolution. Frustrated genome scientists peer at their parasite data scratching their heads, unable to assign a function to as many as 60% of the genes.
While it might be argued that our inability to assign functions to genes is a general limitation of the genomics era (read Michael Adams arguing just this in BMC Biology), the rampant innovation in parasite evolution makes for particularly uncharted genomes – a point emphasized by parasitologist Malcolm McConville in his recent 'Open questions' article.
In their BMC Biology study, Christian Tschudi and colleagues explore the question of whether short genes, otherwise known as small open reading frames or smORFs, and the small proteins that they encode might make up part of T. brucei's uncharted and highly parasite-specific genome.
993 smORFs are identified, the vast majority of which do not appear to be shared with any other species for which sequence data are available, suggesting that these smORFs may indeed be novel armaments in T. brucei's parasitic arsenal.
But what do smORFs do?
Detailed functional characterizations of smORFs are few and far between. Two highly conserved smORFs were recently reported to regulate calcium transport in Drosophila, and other smORFs have been shown to modulate transcription factor activity in the regulation of development.
Tschudi and colleagues find mass spectrometry evidence for proteins encoded by 42 of their smORFs, and use knockdown studies to show that seven of these are essential during T. brucei's vector life-cycle stage.
These smORF products are characterized in various other ways, but further work will be required to pin down exactly what they are doing, and the mechanisms by which they operate.
Malcolm McConville observes in his 'Open questions' article that a previously underappreciated component of adaptation in pathogens may relate to metabolism – both optimization of the parasite's metabolism to its host environment and manipulation of the host's metabolism.
But parasitism strategies are nothing if not diverse – including, for example, even the stimulation of an apparent increase in the vector's biting frequency – and so, returning to our theme of the limitations of genomics, it would be unwise to approach any predictions for T. brucei smORF function with too great a level of confidence.
Therefore, while Tschudi and colleagues' work makes the T. brucei genome a little less uncharted, the secrets it holds remain for the moment no less mysterious.