Big data for clinical trials: a perfect blend

‘Big data’ in clinical trials could have the potential to transform real-world evidence in medicine and healthcare. A new review published today in Trials reflects on the promises, barriers, and implications of this and here, co-authors Lars Hemkens and Kimberly Mc Cord discuss some of the issues and what’s next for clarifying uncertainties.

Most people are familiar with routinely collected data (RCD), or any data that have been collected during routine practice (such as via electronic medical charts, population health registries or insurance data) often included under the umbrella «big data». Many of our movements generate electronic traces, such as with every physician encounter, every purchased over-the-counter drug, or every worn Fitbit; for example. Big hopes and dreams of new treatments, better care and improved health follow along. This wealth of data has the potential to predict our deepest wants and needs, and what treatment works best for us. But does the prediction work?

This wealth of data has the potential to predict our deepest wants and needs, and what treatment works best for us.

Old problems

Much of medical research done with these big data sets is observational in nature, not an experiment. It is based on statistical models with many assumptions, and for some crucial assumptions it’s even quite impossible to say if they hold true (they are often described as “untestable”).

Unfortunately, medical history is full of examples where patients were harmed by basing decisions on observational research. So, when it comes to testing treatments and assessing if patients fare better when receiving such treatment and not the other, we need a randomized experiment. That’s not new.

Old solutions

… routinely collected data has emerged as a tool for improving randomized clinical trials (RCTs). Some even said this would be the “next disruptive technology.” in clinical research

But there is hope. We can perfectly rely on the tools we have right now: randomized trials. They don’t require complex modelling, knowing anything about treatment mechanisms or understanding the risk profile of patients. And who says we can’t use RCD to find participants or assess their outcomes through randomized interventions and find out what works best?

The tech industry, including Microsoft, Google or Amazon, uses randomization as “A/B tests”, in thousands of experiments with millions of participants per year. Those having the largest datasets and incredible computational power, use good old randomization.

Evolution

The problems of non-randomization can be hardly avoided. But most of the problems of randomized trials can be. They are man-made and not the fault of the technique itself. And routine data can be a key to solve many of these issues.

Recently, RCD has emerged as a tool for improving randomized clinical trials (RCTs). Some even said this would be the “next disruptive technology” in clinical research. We don’t know yet, but it is a big step in clinical research evolution.

The perfect blend: randomized real-world evidence

Using RCD for RCTs may provide the best real-world evidence

Large strides have been made to include routine data in clinical studies, because to really decide if a treatment works we need many individuals from diverse backgrounds, ideally in real world settings.

We can just design simpler trials, more pragmatic trials, being larger and better reflecting real world care. Being larger means being costlier. But we can use the routine data collected in daily care to measure the outcomes, avoiding cumbersome and costly follow-up visits and avoiding an artificial situation.

Why spend many resources to call patients or their physicians and ask about clinical events, hospitalizations, accidents, or even mortality? Why not just query the database of the health insurance directly? What makes trials really limited is how expensive it is to collect the data. Using routine data bypasses (most) of the costs and even provides further research potential not otherwise amenable with actively collected data, for example for economic analyses or large scale studies (we recently did a nationwide RCT in Switzerland with more than 10 million patient contacts entirely based on RCD). Using RCD for RCTs may provide the best real-world evidence.

… just because this data is present, it doesn’t mean that we are able to get it how we want it and when we want it. There are still many limitations in using routinely collected data…

Promises, barriers, and implications

Yet, just because this data is present, it doesn’t mean that we are able to get it how we want it and when we want it. There are still many limitations in using routinely collected data, such as their availability, format and detail level, their accuracy and validity, patient privacy concern; the list continues – but of note, all these issues affect both, randomized and observational research.

They are thoroughly explored in the article published today in Trials,Routinely collected data for randomized trials: promises, barriers, and implications”.

The RCD 4 RCTs initiative

We started recently the RCD 4 RCTs” initiative. This is aiming at clarifying uncertainties in the application of RCD in clinical trials providing trial planning guidance as well as by establishing a RCD trial repository; thus, supporting fellow researchers and all stakeholders and building an academic network of people interested in improving clinical trials for better decision making in health care.

We believe that in a not so distant future, it will be commonplace to use automatically provided health data for clinical trial research, what in turn, will give us general and personalized answers to increasing our wellbeing.

In the meantime, thank you for reading our blog post.

 

View the latest posts on the On Medicine homepage

Comments