The increasing use of internet, social media, wearable and mobile devices, and various e-health services, claims and billing activities, hospital and pharmacy records, product and disease registries, etc, has led to the rapid generation of various types of digital data related to population health. However, the voluminosity and complexity of real-world data urge for the development of more appropriate, sophisticated, robust, and innovative data analytic techniques to make the best use of them. In the medical field, real-world data (RWD) can be defined as data relating to health outcomes or the delivery of healthcare routinely collected in real-world settings.
RWD have several characteristics as compared to data collected from randomized trials in controlled settings. First, RWD are observational, as opposed to data gathered in a controlled setting. Second, many types of RWD are unstructured (e.g., texts, imaging, networks) and at times inconsistent due to entry variations across providers and health systems. Thus, RWD can be messy, incomplete, heterogeneous, and subject to different types of measurement errors and biases. In some cases, the collected data are also an unrepresentative sample of the underlying population, sometimes occurring without notice or lacking information to validate. Third, RWD may be generated in a high-frequency manner (e.g., millisecond levels in ECGs and from wearables), resulting in voluminous and dynamic data. The messiness of RWD is well-recognized; how to improve the data quality and properly use RWD to generate unbiased RWE is a work in progress.
The increasing volume of RWD and the fast development of artificial intelligence (AI) and machine learning (ML) data analytics techniques, together with rising costs and recognized limitations of the traditional clinical trials, has spurred great interest in the use of RWD to enhance the efficiency of research and bridge the gap between clinical research and daily practice. During the COVID-19 pandemic, RWD were consistently used to generate RWE on the effectiveness of COVID-19 vaccination, to model localized COVID-19 control strategies, to study behavioral and mental health changes in relation to the lockdown of public life, among others.
A wide range of methodologies can be used to make appropriate and effective usage of RWD derived from pragmatic trials, which are designed, in principle, to test the effectiveness of an intervention in the real-world clinical setting. Pragmatic trials measure various types of outcomes, mostly patient-centered, instead of the typical measurable symptoms or markers in the classical explanatory trials. Due to the characteristics of RWD, new guidelines and methodologies from explanatory trials have been developed to generate unbiased RWE for decision making and causal inference, especially for the per-protocol analysis, arguably more relevant for decision-making purposes.
Target trial emulation is the application of trial design and analysis principles from randomized trials to the analysis of observational data. Target trial emulation can be an important tool especially when comparative evaluation is not yet available or feasible in randomized trials. Controlling for selection bias and confounding is key to the validity of this approach because of the lack of randomization and potentially unrecognized baseline differences, and the control group needs to be comparable with the treated group.
In terms of data analytical approaches, ML techniques are getting increasingly popular and are powerful tools for predictive modelling. Moreover, new, and more powerful ML techniques are being developed rapidly. There are also many open-source codes (e.g., on Github) and software libraries (e.g., TensorFlow, Pytorch, Keras) out there to facilitate the implementation of these techniques. Statistical modelling and inferential approaches are alos necessary for making sense of RWD, obtaining causal relationships, testing/validating hypotheses, and generating regulatory-grade RWE to inform policymakers and regulators in decision making. The motivation for and the design and analysis principles in pragmatic trials and target trial emulation are to obtain causal inference, with more innovative methods beyond the traditional statistical methods to adjust for potential confounders and improve the capabilities of RWD for causal inference. A well-known framework in that direction is targeted learning that has been successfully applied in causal inference for dynamic treatment rules using EHR data and efficacy of COVID-19 treatments, among others.
In conclusion, RWD have the potential to generate valid and unbiased RWE with savings in both cost and time and to enhance the efficiency of medical and health-related research and decision- making, compared to controlled trials. At the same time, RWD have limitations. Procedures that improve the quality of the data and overcome the limitation of RWD to make the best of them have been and will continue to be developed.