Genome sequencing has the potential to improve patient care by enabling physicians to incorporate each patient’s genetic information into disease risk prediction, diagnoses, and medical treatment plans.
Moving towards an era of precision medicine will require an accurate account of each patient’s genetic code. In our study, published in Genome Medicine, we investigated the confidence we can have in DNA sequencing results for the known medically-relevant parts of the genome.
Our investigation builds off of a gold-standard genome sequence developed by the US National Institute of Standards and Technology (NIST). The NIST team combined the results from five different sequencing technologies to identify ‘high confidence’ regions of the genome. These high confidence regions span 77% of this individual’s genome.
The medical genome
We studied the overlap of these high confidence regions with 56 genes that the American College of Medical Genetics and Genomics (ACMG) considers to be medically actionable. These genes are associated with treatable medical conditions, so ACMG guidelines recommend that providers report information about these genes, when available, to patients.
A false-positive finding (e.g. an identified mutation that is not actually in the patient’s genome) could result in unnecessary interventions.
For instance, BRCA2, a gene from the ACMG list, is associated with hereditary breast and ovarian cancer. If a mutation in BRCA2 is found in the sequence of a patient’s genome, ACMG guidelines state that the provider should inform the patient of this mutation so that she can consider a risk-reducing mastectomy or other preventions.
These BRCA2 guidelines and others show the clear benefit of clinical genome sequencing. However, a false-positive finding (e.g. an identified mutation that is not actually in the patient’s genome) could result in unnecessary interventions. Therefore, it is particularly important to obtain accurate sequence information for these ‘actionable’ genes.
What we found
In our study, we found that only 82% of protein-coding regions in the ACMG genes are located within high confidence regions. Today’s sequencing methods did not reach ‘high-confidence’ consensus on the remaining 18%.
Of note, this list of 56 genes represents only a portion the entire medical genome. Thus, we also examined a larger set of 3,300 genes that have known associations with human disease. For 593 of these genes, less than half of their protein-coding regions fall into the high confidence regions.
Upon closer investigation, we found that the high confidence regions tend to contain more unique and non-repetitive sequences – the genomic equivalent of ‘low hanging fruit’ – than the low confidence regions.
A path forward
Given what we found, it is crucial to identify and understand the ‘challenging’ regions of the genome.
Given what we found, it is crucial to identify and understand the ‘challenging’ regions of the genome. Indeed, doing so will help us focus efforts in developing novel sequencing technologies and analysis methods.
Moving forward, community platforms for regulatory science (e.g., precision.fda.gov) that enable sharing of software and genomic datasets will be key to evaluating and improving the accuracy of genome sequencing.
Our analysis relied on a standard derived from five sequencing technologies. In practice, there are dozens of strategies available to sequence genomes. Each approach makes trade offs between the cost of sequencing, time to results, and type and frequency of errors.
This means that different approaches may produce different results and these differences may have important clinical implications. To move toward precise genomic medicine, we must be able to reliably sequence and decipher the difficult regions of the genome.