SRST2: a new tool for genomic epidemiology

In this Q&A, Michael Inouye and Kathryn Holt, authors of a Software article recently published in Genome Medicine, tell us about the development of the software SRST2. SRST2 is a read mapping-based computational tool that allows fast and accurate detection of genes, alleles and multi-locus sequence types from whole genome short sequencing reads.

Generic-pink-bacteria-cropped-e1418906299578 18.12

Why did SRST2 need to be developed?

“Genomic surveillance is being adopted by diagnostic and public health labs all over the world, as a replacement or supplement to routine typing and outbreak investigation. This is recognized as a critical part of tackling the global threat of antibiotic resistance, as outlined in recent reports from theWhite House and World Health Organization.”

“Although sequencing costs are coming down, unless you are a bioinformatics expert, it is difficult to extract the important information from genomic data. It is particularly difficult to do this in a reliable, reproducible way, which is really essential for diagnostic and reference labs because this information gets used to make decisions about individual patients, as well as infection control and public health.”

“The type of information we are talking about is (i) detecting specific strains of bacteria, including ones that are notorious for causing hospital outbreaks – such as methicillin-resistant Staphylococcus aureus, vancomycin-resistant Enterococcus and carbapenemase producing Klebsiella – and (ii) detecting antibiotic resistance genes, which is important both for predicting treatment failures and for tracking the spread of drug resistance. The same approach can be used to identify virulence genes and plasmids.”

How did you go about the process of developing the software?

“The SRST2 project started with a summer intern, Harriet Dashnow, who was given the job of figuring out how to extend our earlier program SRST (which was designed to extract multi-locus sequence typing information from Illumina reads) to detect and type resistance genes. It became clear, however, that to work reliably, we would need to overhaul the scoring system to something that was faster and more robust.”

Mike Inouye designed a new scoring approach from the ground up, which Harriet and Bernie Pope reimplemented in optimized Python code. We called this new tool SRST2. We then did a whole lot of testing on public data and asked colleagues at a local reference lab, the Microbiological Diagnostic Unit, to test it out for themselves on some Listeria monocytogenes they had recently sequenced on their Illumina MiSeq.”

You can read more about this story here.

What challenges did you face during the process?

“In this case, it was a matter of analytically exploring hundreds of bacterial genomes from a variety of specifies to identify the most suitable alignment algorithm (in this case, Bowtie2), then design a scoring system which accommodates the behavior of the aligner as well as the gene/allelic content that we observed in our sampling of genomes.”

“Once this was done, it was a matter of sitting down and focusing for a week or so on getting the source code written and preliminarily tested. Then a further testing phase ensued to assess SRST2′s ability to meet wider challenges, including real life and potentially time critical scenarios in hospitals and public health settings. During this process, each stage was a challenge and when they are all met you get the nice tool just published.”

Was there anything that surprised you about the results?

“We were a bit surprised by how well SRST2 worked initially, both in terms of speed and accuracy. It really made us confident that SRST2 could have a significant impact in real life settings where scenarios like genomic typing and prediction of virulence and antimicrobial resistance are critical in terms of getting accurate answers as quickly as possible.”

What impact do you think this software will have on microbiology diagnostics?

“SRST2 can provide fast answers to a lot of clinically important questions, direct from raw sequence reads and in a single command. It is also very robust, in that it is fairly insensitive to drops in data quality or quantity. I think for those reasons it has the potential to be a very important tool in the arsenal of all labs engaged in bacterial genomics.”

“It works great as a first screen: once you have your SRST2 results and know what you are dealing with, you can decide whether further analyses are required, such as whole genome phylogenies for transmission tracking or putting the effort into building high quality assemblies.”

This study was published as part of a joint special issue with Genome Biology guest edited by George Weinstock and Sharon Peacock on the Genomics of infectious diseases.

View the latest posts on the On Medicine homepage