The relevance of sequence conservation in molecular biology cannot be disputed. Ever since the pioneering work of Linus Pauling and Emile Zuckerkandl in the early 60’s, it has been the driving force behind numerous discoveries. By the mid-60’s, Margaret Dayhoff was already leveraging computers to draw links between sequence similarity and evolutionary relationships while her Protein Atlas introduced the concept of gene families.
In the more than 50 years since that time, scores of amino acid and nucleic acid sequences with various degrees of conservation among numerous organisms and viruses have been linked to fundamental pathways. Discoveries over the last 20 years have revealed striking instances of conservation in organisms separated by hundreds of millions of years of evolution.
Characteristic examples include the let-7 family of the regulatory non-coding molecules known as microRNAs (miRNAs) that is conserved from worms to humans, and the tumor suppressor gene known as TP53 that is known to have 40 copies in elephants, a species resistant to the development of cancer. Perhaps not surprisingly, when such conserved sequences are either deleted or altered then disease sets in.
Discoveries over the last 20 years have revealed striking instances of conservation in organisms separated by hundreds of millions of years of evolution.
For a long time, the deviations from this scientific paradigm were infrequent. In fact, the counter-examples that appeared in the literature were so rare that biologists, geneticists, and clinicians used this observation as an argument in support of the conservation rule. Perhaps the best known early example of an important molecule that was not conserved is XIST. Such had been the hold of sequence conservation in researchers’ thinking that conservation was soon assumed to be not only sufficient but necessary as well.
As shotgun sequencing became the norm in the late-90’s, more genomic sequences became available thereby facilitating comparisons. Perhaps the best-known example of a large-scale study of sequences across genomes was the one carried out by the Haussler and Mattick groups that led to the discovery of the ultraconserved elements (UCE). UCEs are stretches of a few hundred nucleotides each and are highly conserved from zebrafish to humans.
The first reported large scale study of sequences within a genome was in 2006 by the Rigoutsos group who investigated the distribution of short sequence motifs across the human genome. Their study led to the identification of short sequences that are present in messenger RNAs and have many more copies in the so-called junk DNA. They called these motifs “pyknons,” from the Greek work for “dense.” A subsequent comparative study of pyknons derived from the human and mouse genomes showed that pyknons captured functional conservation in the absence of sequence conservation or synteny.
For one transcript in particular, N-BLR, we also showed that it acts as a sponge for the evolutionarily-conserved miR-200 family of microRNAs, which is known to shape the migration and invasion capacities of multiple types of cells.
At around the same time, the Calin group was using expression arrays to investigate miRNAs, UCEs and a handful of pyknons in the context of colon cancer. They found that probes targeted at several UCEs and pyknons were indicating differential expression between normal tissue and colon cancer. A chance encounter and a brief conversation that the two of us had on the campus of Ohio State University on April 13, 2007 launched what became a decade long collaboration between our laboratories on the biological and potential clinical significance of pyknons.
What we have shown in our article, published in Genome Biology, are multiple examples of pyknon motifs that are located in long non-coding transcripts and exhibit altered expression in solid and liquid cancers. For one transcript in particular, N-BLR, we also showed that it acts as a sponge for the evolutionarily-conserved miR-200 family of microRNAs, which is known to shape the migration and invasion capacities of multiple types of cells.
Our findings have several implications. First, it is important to note that N-BLR’s sequence is present in the human genome and the genomes of several other primates, but not in the genomes of rodents. An immediate consequence of this is that N-BLR’s biological roles cannot be captured by mouse models of colon cancer
The pyknon motif in N-BLR has numerous copies in the human genome, at least some of which are likely transcribed. This immediately “links” N-BLR to the corresponding loci, and partners it to any transcripts that arise from those loci (‘sponging’ partners). The pyknon also links N-BLR to loci that contain the reverse complement of the motif (‘target’ partners).
All these links mean that introducing N-BLR in a mouse cell (through transfection or engineering) is not likely to facilitate the study of its role in the human context: N-BLR’s extensive network of sponging and target partners will be absent.
It is important to note that evidence of analogous coupling of transcripts was previously reported for miRNAs, protein-coding genes, and their pseudogenes. The pyknon framework ties together long non-coding RNAs and mRNAs in a far larger number of combinations.
Second, the above observations are not unique to N-BLR. Many of the human pyknons are either human- or primate-specific. Recall that, by definition, each such motif is present within at least one messenger RNA and at multiple genomic loci. This suggests a potentially wide network of non-coding transcripts that are coupled with one another and with protein-coding genes through a single, exactly‑conserved pyknon motif.
It is known that human cancers are never perfectly reproduced in other animal models, except primates, and the pyknon layer could be at least a part of the explanation.
Given the high number of distinct human pyknons (209,432) whose instances can be transcribed in tissue-dependent combinations, it follows that elaborate and organism-specific networks of coupled transcripts may be at work. These networks will be absent from mouse models. It is known that human cancers are never perfectly reproduced in other animal models, except primates, and the pyknon layer could be at least a part of the explanation.
Third, our expression array experiments that probed genomic instances of multiple pyknons show that pyknon-containing transcripts have abundances that parallel those of miRNAs. Additionally, the abundance patterns of pyknon-containing transcripts appear to be tissue-specific and to change between health and disease. Because we designed the array probes to target cancer‑associated genomic regions (CAGRs) regions, it follows that any genomic disruptions of these regions will be reflected in a disruption of the corresponding transcripts.
Fourth, the tissue-specific behavior of pyknon-containing transcripts raises the possibility that they could be leveraged for the identification of novel biomarkers. These alternative biomarkers might enable earlier diagnosis and an increased number of therapeutic choices. However, we stress that these observations are extrapolated from what is currently a limited number of experiments with these molecules. Extensive studies will be required to evaluate the possibility that these transcripts can help us design more reliable and reproducible markers.
In closing, we wish to mention one more intriguing possibility. It is conceivable that pyknon-containing transcripts circulate between cells and represent a new type of signal that occurs only in humans. In turn, this would imply that only human cells (including the immune cells) have the sensors needed to identify these (putative, for the time being) signals. Were this possibility to gain experimental support, the corresponding transcripts would become prime candidates for therapeutic exploitation.
Considering the evident complexity of what N-BLR represents, it is safe to say that it will take a lot of work by many laboratories to solve these new puzzles. We believe that we are at the beginning of a long journey into uncharted pyknonland. By sharing our insights, we attempted to convey our excitement to the journal’s readers, in the hope that a fair number will decide to embark on research efforts aimed at deciphering the roles of these pyknon-containing transcripts.
George Calin and Isidore Rigoutsos
Resource: the human and mouse pyknons and their genome-wide instances are available at https://cm.jefferson.edu.