The impending age of big data has been inescapable in recent discourse, both scientific and otherwise. The prevailing metaphors cast big data as a tsunami or an avalanche, suggesting natural disaster poised to dash hapless researchers against the rocks. They are, of course, no such thing, and offer many opportunities provided that one is prepared. Some of these opportunities were on show at the Royal Society discussion meeting on “Next-generation molecular and evolutionary epidemiology of infectious disease”.
One focus, inevitably, was next-generation sequencing, with Paul Kellam speaking about its importance in tracking the spread of the three waves of the 2009 H1N1 pandemic at a population level; but also for following the rapid spread of polymorphisms though the virus population within a single patient. Bill Hanage discussed the use of whole-genome data to investigate phylogenetic relationships within highly recombinogenic bacteria like Streptococcus pneumoniae, for which traditional genetic methods simply can’t cut it.
Some notes of restraint were sounded too. Dan Haydon told a cautionary tale about the current inability of deep sequencing to separate foot-and-mouth virus variation within an individual from technical noise, and suggested that in evolutionary studies of this virus, variation between individuals is the highest resolution at which we can currently make reliable inferences (the resulting graphs using “1 cow” as a basic unit of time amused, although we don’t imagine it shall become an SI unit any time soon). Contrary to the “sequence everything” school of thought, Sharon Peacock of the Health Protection Agency made the case for sparing use of whole-genome sequencing in clinical microbiology, where phenotypic tests still offer good effectiveness for their cost and in the majority of cases sequencing is an unnecessary expense.
Epidemiologists are perhaps uniquely interested in co-mapping phylogenetic and spatial data, given the power of explicitly tracking the recent history of disease spread, and there were a number of talks on the potential for spatial phylodynamics in modelling the spread of diseases including rabies and influenza. Perhaps the most striking example of the efficacy of this approach was given by Sharon Peacock, who showed how whole genome sequencing of MRSA allowed spatial mapping of its spread at the resolution of wards within a single hospital.
However, collecting spatial data is no straightforward task, and the future of surveillance was a popular subject. Simon Hay discussed a project to update global risk maps for disease, which are “often diabolical”, through careful curation via survey of the existing data and literature – but this is very costly in time and resources, and the future of this kind of curation might be driven by automated data-mining of resources such as PubMed and GenBank. Larry Brilliant spoke about Google.org’s Flu Trends, which tracks outbreaks through users’ flu-related search terms with surprising success – often reporting peaks in flu activity a week or two ahead of the CDC’s GP-reported data – and went on to give an overview and endorsement of the current trend for web-based crowd-sourcing of reports through sites like HealthMap and ProMED.
Readers will, of course, be wondering about the privacy issues related to these new kinds of data-collection methods, and this was on attendees’ minds too. Nowhere is the discord between the need for patient privacy and the public health benefits of data release more apparent than epidemiology, where the geographic location of a patient – a key piece of information – goes a considerable way to revealing their identity. One unsavoury possibility is the future prospect of using a combination of genetic and epidemiological data to personally identify a key patient; say, “patient zero” for a particular pandemic, or an infection-multiplying “superspreader” for HIV – although it is important to emphasise that neither of these is likely at present. The discussion of these issues was only one aspect of a lively panel discussion to close the meeting, which also took in issues of data quality and accessibility, and how encouraging data citability might be one way to solve them. (Those interested in data citation might like to read the recent blog post and associated BMC Research Notes article on the current gold standard).
For those whose interest in evolutionary epidemiology has been piqued, suggested further reading in BMC Biology comes from Trevor Bedford and colleagues’ recent research modelling the evolutionary reasons for strikingly low standing diversity in the H3N2 flu virus; and Nobel laureate Peter Doherty and colleague Paul Thomas’s comment on why knowing which mutations to look for in natural H5N1 flu reservoirs is more important than the perceived dangers which lead to the redaction – now reversed – of the description of particularly virulent laboratory strains.