Organising COVID-19 data will help us respond more swiftly to this and future pandemics

Many countries are now easing coronavirus restrictions despite a great amount of uncertainty ahead. Returning to normality is a complex challenge that requires politicians to make exceedingly hard and risky decisions to minimize socio-economic costs without jeopardizing lives. Governments must weigh short-term impacts against long-term consequences in a range of contexts without a one-size-fits-all solution. We highlight three policy challenges that call for multidisciplinary approaches and a new, coordinated, transparent data infrastructure to avoid repeating recent failures in COVID-19 research and to mirror successes in other fields.

A first challenge is the spatial coordination of decisions concerning social distancing and other interventions. As governments strive to treat citizens fairly, many countries are adopting nation-wide or sometimes state-wide measures. However, infection patterns cluster geographically and often transcend political and administrative boundaries. One consequence is that communities with few documented COVID-19 cases, often in rural and low-populated areas of weaker economic strength, are being forced to follow policies designed for infection hotspots in wealthier cities with better economic recovery abilities. Current research points to the importance of clusters and superspreading events in driving the pandemic, but the level of asymptomatic spreading remains unclear. The lack of data collection in Swedish schools therefore constituted a missed opportunity to assess the effect of open schools on contagion, which would be critical for the reopening plans of many countries and would help avoid new infection waves.

A second challenge is time lag – positive tests often reflect infections that took place more than ten days earlier, and it will be weeks before the outcomes of today’s decisions are fully known. Epidemic models must also deal with people changing behaviour due to published predictions. Learning from countries that are at later stages of the pandemic is difficult because of confounding factors. Designing responsive measures based on the current number of confirmed cases, occupied hospital beds and/or deaths will fall behind and fail. Trial-and-error approaches invariably lead to over- or underestimating the level of precaution needed, at the cost of money or lives. Critical data should therefore be gathered daily and with associated meta-data, such as waiting times for testing results and public government advice.

A third and major challenge is the complexity of the information required for designing effective policies. Although maximum lockdown may save most lives in the short term, it can have the opposite effect in the long term, for instance through increases in domestic violence, alcohol consumption, poverty, criminality, and hunger. Negative feedback may vary across regions depending on population density, socio-economic factors, and access to healthcare. Countries should therefore provide key socio-economic parameters alongside health data.

Scientists and other experts have thrown themselves at these challenges with a suite of approaches. They have produced more than 130,000 preprints and journal articles over just a few months. COVID-19 research has reached a volume that other fields take decades to reach, warranting an infrastructure similar to that offered by mature data-intensive fields. Researchers have repeatedly solved the same problem of searching, collecting and organizing the data, wasting valuable time and resources.

A major step towards solving these challenges is the urgent compilation and standardisation of COVID-related data across the world. Besides enabling the provision of open and readily accessible data to the research community for efficient modelling, a new global data infrastructure would help identify data gaps. Crucially, data must be transparent and traceable to their sources. Lack of traceability caused the first research scandal in the COVID-19 era, with The Lancet and The New England Journal of Medicine (NEJM) retracting two papers that shaped policy decisions and affected millions of lives.

The past decades of biodiversity and genomic research demonstrate that open-access and organized databases benefit the entire scientific community worldwide and contribute to pushing the boundaries of our knowledge. Successful examples include the National Center for Biotechnology Information, the Global Biodiversity Information Facility, and Nextstrain, an open-source project for real-time tracking of pathogen evolution that now also helps tracing introductions of COVID-19. Data challenges such as Kaggle competitions, which regularly attract thousands of teams and individuals, could foster data gathering and methodological development, besides helping researchers navigate and extract insights from the rising flood of COVID-19 papers.

The COVID-19 crisis may become the biggest global challenge we have attempted to tackle with AI and other data-driven methods. Unlike several 20th century crises that led to technological breakthroughs, advancements now will depend on the quality and quantity of data gathered during the pandemic. As nations around the world test a wide range of policies, we urge the scientific community and policymakers to agree on essential and standardised data to collect and make available, while protecting patient integrity. Organizing the COVID-19 data is an investment not only to help us navigate out of this pandemic but also future pandemics that may emerge.

View the latest posts on the On Biology homepage