“Real world data” (RWD), “real world evidence” (RWE) or “observational studies” are terms widely used in recent years and famous topics in many events such as congress, thematic days, workshops or even in authorities with national and international scope. The Food and Drug Administration (FDA) gives the following definition of RWE: “The clinical evidence regarding the usage, and potential benefits or risks, of a medical product derived from analysis of real world data” and the British Academy of Medical Sciences, the following: “The evidence generated from clinically relevant data collected outside of the context of conventional randomized controlled trials”.

The emergence of RWE is related to all progress performed the last decades in informatics and technologies to collect and store data. Today, a lot of clinically relevant data is routinely collected for administrative as well as medical or personal reasons: hospital stay, reimbursement, follow-up of subjects by physicians, connected devices for sporting people or for cardiac parameters or quality of sleep, etc. All are collected continuously and they are clearly under exploited. Collection is generally prospective and can be automatic (directly linked with informative tools) or not (technicians needed as clinical research assistant to capture and monitor data).


To demonstrate a causal relationship between a treatment or exposure with a clinical endpoint, the gold standard is to perform a randomized blinded controlled clinical trial. Classically, study design includes one arm with subjects taking the product of interest and a second arm with subjects under a treatment of reference or a placebo (NB: adaptive design can be slightly more complex). As illustrated in figure 1, this experimental design ensures to control several biases:

– The control group allows to follow the natural evolution of subjects and an effect of reference (placebo effect, vehicle effect, or the current efficacy of the treatment of reference).

– The randomization of subjects at the beginning of the product consumption ensure to have two groups comparable in all characteristics except on the product consumed during the study.

– Double blind at the beginning and during all the study permits to keep comparability between the groups, to follow the two groups identically and to evaluate them in the same manner (no bias of selection, follow-up, classification nor evaluation).


Figure 1: schematic representation of a randomized double blind controlled clinical trial and corresponding biases controlled with this experimental design.

So, clinical trial is a perfect world where the only thing differing from the two populations is the exposition of interest. Moreover, in clinical trials, everything is done to avoid dropout during the study (attrition bias), the follow up of subjects is predetermined and standardized and the quality of data is very high especially thanks to the monitoring done by Clinical Research Associates (CRA). Moreover, during all the follow-up, recommendations can be given to subjects in order to still control biases (no excess in food, physical activity, tobacco, alcohol, …), while in real life, subjects will not be constraint. A lot of effort is done to minimize variability: for example, all biological data are measured in a centralized laboratory. In addition, one special point important to note is that the subjects participating to a randomized clinical study are selected very precisely to have more homogeneous population. This makes it possible to decrease the number of subjects required to show an effect, so increasing the chances of demonstrating an effect. Very young or very old patients, with special regimens or high physical activity or consuming a lot of alcohol or tobacco, or with comorbidities are classically excluded from trials. In addition, the participating centers, and therefore the volunteers, are rather located in urban areas. With all these specificities, subjects’ characteristics may be different in clinical trials compared to real life. Moreover, the involvement of subjects is clearly not the same as the participants are volunteers in the trials and can be paid for their participations. This can improve compliance and decrease the dropout rate.


Observational data are usually and in first instance used for phase 4 clinical studies for safety monitoring. However, a lot of other goal can be reached with observational data. They can be complementary to clinical randomized studies especially to study long term endpoints. For example, this can be the case of a clinical trial studying the fasting glycaemia level for pre-diabetic subjects while the long term event of interest is the occurrence of a disease state (with the need for treatment). In some other contexts, clinical trials can’t be performed and only real-life data is available: for ethical reasons, in rare diseases (too less patients) or when randomization is not feasible (mode of delivery or impact of nutrition on health status or cardiovascular risk), etc. In these cases, the use of observational can bring valuable information.

Figure 2 from Spitzer et al. 2018 summaries recommendations to use observational data: control arm can be defined using existing databases, pilot studies can be realised to identify at risk patients or to obtain estimation of parameters required to define the number of subjects needed in clinical trial. Indeed, observational data can be useful to design more precisely a new clinical trial.




Figure 2: from Spitzer 2018 Expert Opinion on drug safety. Initial title: Areas where real-world evidence can be used. 1Examples: rehospitalization rates, complications rates; 2Example: effect of health policies in outcomes; 3Examples: observational or retrospective analysis.  






Causal inference from observational data is a challenging topic focusing on design and statistical issues. In fact, data and their collection are more susceptible to biases. A lot of energy should be dedicated to planned the analyses, with the same rigor as in experimental design. The most common bias in real world data is the immortal bias. In fact, the time zero (corresponding to the time of randomisation in a clinical trial) can be difficult to determine. Directly linked, criterion for integrating one subject is very important. This criterion is similar to the ambivalence clause in clinical trial and is usually called positivity clause in observational studies (a subject should be considered only if he could have been in the two groups studied: control or studied product). Because of lack of randomisation, confusion factors have to be taken into account. A confusion factor is, by definition, a factor associated with the exposure and the endpoint but not in the causal pathway between exposure and endpoint. For that, a lot of statistical methodologies can be used with conditional effect (multivariable models) or marginal effect (g-computation, methods using propensity score with inverse probability treatment weighting). Moreover, the follow up of patients is not standardized as in clinical studies, and the timeline of the follow up as well as the missing values can be informative (delay between visits and/or number of visits can depend on health status of patient). All of these points have to be taken into account in the choice of statistical analyses.




Sources of data


As previously mentioned, a lot of observational data are routinely collected. They can be storage in :


– cohort (open or closed, i.e. with continuous inclusion of new patients or only a predetermined inclusion period),


– registries (as a registry with causes of death like Cepidc or registries for cancers in France by regions),


– national system of health data which includes all data for which any reimbursement (even partial) is done,


– other sources (web or mobile applications, medical devices, …).


Currently, data can be available on demand and after the opinion of scientific advisory of each database, which can take a lot of time. A french project, the Health Data Hub has the goal to facilitate the access of these data purposing a unique, centralized and secure way to access data. A very nice and promising presentation of this structure was done during the french congress EPICLIN.






Observational studies and Clinical trials are complementary approaches. Clinical trials focus on the theoretical efficacy of a product or exposition while observational studies focus on its efficiency, in real life and for broader populations.


Observational analyses can lead to extend the conclusions of a clinical trial on the efficacy of the product or of an exposure in another, less stringent context. However, design and methodological choices are of primary importance in real world to take into account as good as possible all potential biases. Especially, the data dredging must be prevented. For that, the same steps as in experimental designs must be used: drafting of the protocol and validation of the statistical analysis plan before receiving any data.




To go further:


Franklin JM, Pawar A, Martin D, Glynn RJ, Levenson M, Temple R, et al. Nonrandomized Real-World Evidence to Support Regulatory Decision Making: Process for a Randomized Trial Replication Project. Clin Pharmacol Ther. avr 2020;107(4):817‑26. https://pubmed.ncbi.nlm.nih.gov/31541454


Maissenhaelter BE, Woolmore AL, Schlag PM. Real-world evidence research based on big data: Motivation-challenges-success factors. Der Onkologe: Organ Der Deutschen Krebsgesellschaft eV. 2018;24 (Suppl 2):91‑8. https://pubmed.ncbi.nlm.nih.gov/30464373/


Spitzer E, Cannon CP, Serruys PW. Should real-world evidence be incorporated into regulatory approvals? Expert Opin Drug Saf. 2018;17(12):1155‑9. https://pubmed.ncbi.nlm.nih.gov/30412009/








– Marie-Cécile Fournier, Biostatistician, Biofortis Mérieux Nutrisciences –



Crédit photo : Gerd Altmann de Pixabay

For more information, don’t hesistate to contact us :