Too fast to catch?: a perspective on the state of transposable elements in research

We would like to attract your attention to our new BMC Genomics collection, titled "Transposable elements". More information can be found in this blog post written by Guest Editor Gökhan Karakülah and his PhD student Doğa Eskier who discuss the importance of transposable elements (TEs) and shine a spotlight on our current and growing knowledge of TEs.

When Barbara McClintock dubbed them “controlling elements”, the term she used was perhaps far more prescient and descriptive than the term more commonly used today, transposable elements (TEs). In many organisms, including in clades as diverse as plants, animals – mammalian and otherwise -, and bacteria, TEs are a major component of the genome, as well as its organization and regulation. In humans, roughly 45% of the genome comprises TEs, with a large majority of that consisting of retrotransposons, while in mice, TEs make up 37% of the genome. Far from mere evolutionary junk, TEs have a variety of modes of impact on the genome and the transcriptome. Even in mammalian genomes, where many TEs are dormant, multiple classes of TEs, such as LINEs and Alu elements, still show active transposition [1, 2]. This can be a part of the normal genetic variety of the species, but is more frequently linked with adverse outcomes, such as developmental disorders, genetic diseases, and tumor formation . To prevent such outcomes, TE activity and expression is tightly regulated in many genomes, often using TE proprietary regulatory mechanisms. For example, the suppression of TEs during development often remains intact, despite near-global demethylation and formation of pervasive euchromatin regions [3].

In more recent years, the initial spike in research pertaining to TEs has been followed by a steady decline in the number of publications since 2013 (based on number of publications listed on the PubMed archives). In comparison, even when we do not account for the large amount of research related to the COVID-19 pandemic, the number of total research articles, as well as research in other fields has seen a general steady increase. While TEs continue to be a source of novel and vital information in many types of biological activity, the research has simply failed to catch up with the growing understanding of their importance in the genome.One of the potential reasons for this decline is the same property of TEs that makes them so important to the genome: as repeat elements, which, unlike tandem repeats, are found scattered across the genome, in-depth study of TEs requires increasingly more specialized tools and methods to yield new insights [4, 5]. For example, due to the potential impact of individual TE instances on the local genomic regions, such as through contributions of regulatory motifs (e.g. transcription factor binding sites), analyzing TE activity as a single meta-gene might lead to insufficient or misleading results. In addition, as mentioned previously, many TEs are actively transposed in both germline and somatic cells, often creating large scale structural variances that are difficult to capture properly without examination of the individual cells or organisms. Use of a reference genome for an organism without taking into account the possibility of such transposition might lead to missed opportunities for TE research. Overcoming such challenges might require extensive modifications to or adaptations of accepted methodologies, which is no trivial task.

TEs are a major component of the genome, as well as its organization and regulation.

Despite the decrease in published TE research, recent years have also seen an increasing number of computational tools designed to create bespoke solutions for problems faced by TE researchers [5]. In bulk RNA sequencing, the analysis of TEs has seen significant improvements, thanks to tools such as SalmonTE, by Hyun-Hwan Jeong et al., and SQuIRE, by Wan R Yang et al. . These tools allow improved locus specific quantification, allowing for more precise data. Due to the more complicated nature of the resulting data, TE research using single cell sequencing continues to see improvements. Previously, Jiangping He and colleagues have made available the scTE pipeline, which allowed for the quantification of TEs on a metagene level in single cell RNA-seq data. More recently, Rocío Rodríguez-Quiroz and Braulio Valdebenito-Maturana have published soloTE , which can quantify TE expression in single cell data on a locus-specific level, leading to further refinement of such analysis. For analysis of retrotransposons which code for peptides, and thus contain speficic domain sequences, Mikhail Biryukov and Kirill Ustyantsev have created DARTS , for domain associated retrotransposon search in genome assemblies. Meanwhile, even more improved usage of such developed tools is possible. In a benchmark analysis involving multiple algorithms and pipelines, including the previously mentioned SQuIRE and SalmonTE, Natalia Savytska et al. have integrated transcription start site profile data to more accurately identify and quantify instance specific TE expression [6]. Thanks to such innovations, there is hope that our understanding of TEs will soon see an acceleration.

In conclusion, TE research remains a vital, yet underappreciated field of study. Lowering the barriers to conducting and understanding TE research is an important prospect, requiring the contributions of many researchers in multiple biological disciplines. This is why we would like to attract your attention to our new BMC Genomics collection, titled “Transposable elements“, to shine a spotlight on our current and growing knowledge of TEs.

View the latest posts on the BMC Series blog homepage