From Real World Data (RWD) towards Real World Evidences (RWE)

Per the definition by the US FDA, real-world data (RWD) in the medical and healthcare field “are the data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources”[1]. The wide usage of the internet, social media, wearable devices and mobile devices, claims and billing activities, (disease) registries, electronic health records (EHRs), product and disease registries, e-health services, and other technology-driven services, together with increased capacity in data storage, have led to the rapid generation and availability of digital RWD.

The added value of Real World Data

Randomized controlled clinical trials (RCT) are the gold standard for evaluating the safety and efficacy of pharmaceutical drugs and medical treatments, but in many cases their costs, duration, limited generalizability, and ethical or technical feasibility have caused some to look for real-world studies as alternatives. On the other hand, real-world data may be much less convincing due to the lack of randomization and the presence of confounding, time and measurement bias.

There is an increased interest in the use of RWD to support the continuum of evidence generation for innovative medicines. It is expected for instance that RWD should (1) enable the generation of additional evidence post launch about longer-term clinical benefits and harm, (2) inform dynamic price-setting in relation to the value of medicines and treatments, (3) may optimize appropriate drug use in daily practice, (4) inform on levels of adherence to therapy outside coordinated trials context.

RWD in TRUMPET project

Obviously, there are many types of RWD coming from different sources. The TRUMPET project will leverage RWD gathered in routine cancer clinical healthcare delivery through 3 use cases addressing the dimensions above: (1) optimize radiotherapy dose volume histogram according to late side effects in Head&Neck cancer, (2) refine eligibility criteria for more costly stereotactic radiotherapy treatment in metastatic patients, (3) identify target subgroups for immunotherapy in non-small lung cancer patients.

The TRUMPET project will federate RWD, especially Electronic Health Records (EHRs) collected as part of routine care across 3 healthcare centers in order to reduce bias linked to center specificities, geographical area or socio-economic factors. The project complies with data processing regulations in accordance with the GDPR.

Real Word Data to Real World Evidences: Challenges

However, RWD also introduce additional challenges to generate RWE. It should be recognized that RWD hold some measurement bias, such as data quality, incomplete data, heterogeneity and skewness.

When designing RWD study, it is important to identify missing confounding variables to quantify omitted variables bias and placebo effect. For longitudinal observational studies, one should also consider the role of time in the design stage. If we consider a specific treatment or intervention at baseline and a series of follow-up visits collecting variables under analysis, the timeline in RWD likely implies higher variability in the follow-up milestones compared to RCT. When combining RWD from different providers additional challenges arise like difficulties in standardizing data collection for outcome measures and absence of clear standards for data format and storage.

Legal and organizational barriers also impede the analysis of RWD : the lack of agreement between different involved parties regarding what data are needed, at which point in time, and for which purpose; the difference in structure, setup and content of different databases, leading to significant challenges in conducting pan-European use of RWD; the lack of access to, and availability of, data due to rules and restrictions regarding data sharing.

RWE in TRUMPET project

Machine Learning (ML) techniques are getting increasingly popular and are powerful tools for predictive modeling. One reason for their popularity is that the modern ML techniques are very capable of dealing with voluminous, messy, multi-modal without strong assumptions about the distribution of data. It should be noted that the ML techniques are largely used for predictions and classification, biomarker selections, rather than generating regulatory-level RWE. But this may change soon as regulatory agencies are aggressively evaluating ML and Artificial Intelligence (AI) for generating RWE and engaging stakeholders on the topic.

In TRUMPET, the 3 use cases above will be tackled using federated learning methods to build prediction and/or classification models based on e.g. regression, neural network and k-mean algorithms.


Patrick Duflot from CHU Liege