Introduction
Modern epidemiology routinely employs three broad types of modelling frameworks—association, prediction, and causal modelling—each serving distinct scientific aims. Association models seek to quantify relationships between variables without claiming directionality. Standard regression models, for example, estimate whether exposure and outcome “move together,” but they do not determine whether the exposure causes the outcome. These models remain useful for surveillance, risk-factor exploration, or hypothesis generation, but they are not suitable for causal inference.
Prediction modelling, in contrast, focuses on forecasting outcomes with maximal accuracy. Methods such as machine learning, regularised regression, or ensemble algorithms prioritise predictive performance rather than explainability. As a result, prediction-based models may include variables that are not causally relevant and may even adjust for colliders or mediators, choices that would be problematic in causal analysis but perfectly acceptable for prediction.
Causal modelling stands apart because its explicit objective is to estimate causal effects under well-defined assumptions. Causal questions are ubiquitous in Real-World Evidence (RWE): What is the effect of a treatment, policy, or exposure on an outcome? Answering such questions requires understanding mechanisms that generate the data, not merely fitting statistical equations. This is where the conceptual distinction becomes critical: association and prediction models can be statistically “correct” yet causally misleading.
Despite this distinction, much of applied RWE continues to use regression-based associations or prediction-oriented adjustments while assuming they yield causal estimates. Without a structured framework to identify confounders and block non-causal pathways, analytical choices may inadvertently introduce bias. Directed Acyclic Graphs (DAGs) offer an essential solution by enabling analysts to map relationships between exposures, outcomes, and covariates. In so doing, DAGs clarify which variables should and should not be adjusted for to obtain unbiased causal estimates.
Causal Inference
Causal inference is the scientific process of estimating the effect of an exposure or intervention on an outcome while accounting for the complex structure of real-world data. It differs fundamentally from conventional multivariable regression by explicitly mapping how variables relate to one another and by defining the assumptions under which causal effects can be estimated. Causal modelling therefore provides a framework that directly aligns with the objectives of RWE studies, where treatment effects must be estimated from observational data.
To implement causal inference in practice, analysts rely on a range of methodological tools:
Key Tools in Causal Modelling
- Directed Acyclic Graphs (DAGs) – graphical representations that map causal assumptions and identify minimally sufficient adjustment sets for confounding control.
- Multiple Regression – traditional adjustment for confounders; useful but prone to misspecification if the underlying causal structure is not defined.
- Propensity Score Methods – including matching and weighting to balance covariates between exposure groups.
- Inverse Probability of Treatment Weighting (IPTW) – uses propensity scores to construct a pseudo-population in which treatment is independent of confounders.
- Doubly Robust Estimators – combine outcome regression and propensity weighting; unbiased if either model is correctly specified.
Below is a DAG illustrating causal and non-causal (confounding) pathways:

The power of DAGs lies in their ability to identify minimally sufficient adjustment sets—the smallest set of variables that closes all non-causal paths between exposure and outcome. This prevents overadjustment (e.g., adjusting for mediators or colliders), reduces variance, and avoids biases that may arise from naïve reliance on conventional regression.
Ideally, DAGs should be developed before data collection to ensure only necessary variables are captured and potential confounders are not missed. When retrospective data are used, as in many RWE cohort studies, DAGs remain invaluable because they justify covariate selection and clarify remaining limitations.
Causal Inference Modelling in Practice
Regulatory agencies—including the EMA, FDA, and the MHRA through NICE—have increasingly acknowledged the importance of causal inference tools in RWE. Current guidelines encourage methods such as multivariable regression, propensity score weighting, IPTW, and doubly robust estimators for confounding adjustment. These recommendations mark a positive shift toward better causal analyses in real-world settings.
However, a notable gap persists: none of these regulatory documents explicitly require the use of DAGs as a foundational step in study design or analysis. This omission is significant because, without an explicit graphical causal framework, analysts may select variables based on statistical criteria, prior habits, or convenience rather than formal causal reasoning. As a result, confounders may be included or excluded inappropriately.
The absence of DAGs raises critical methodological questions:
- Were the included covariates a minimally sufficient adjustment set, or were unnecessary variables added?
- Were all backdoor (non-causal) paths between exposure and outcome appropriately blocked?
- Did analysts inadvertently control for colliders or adjust for mediators, introducing bias?
- Were assumptions about confounding, measurement error, or selection bias made explicit?
Although regulatory bodies emphasise “robust causal methods,” the lack of guidance on DAGs and causal-structure justification may lead to inconsistent analytical standards across studies. Consequently, two RWE studies using identical data could produce conflicting results simply because their covariate selection strategies were not grounded in explicit causal reasoning.
As RWE continues to inform regulatory decisions, health technology assessments, and clinical guidelines, the need for transparent DAG-based variable selection becomes increasingly important. DAGs provide clarity on why certain variables are included—and why others, even if available, should not be adjusted for.
Recommended Steps in Causal Inference Modelling
Step 1: Develop a DAG
The foundational step in any RWE study should be the construction of a Directed Acyclic Graph. This should involve subject matter experts such as clinicians, epidemiologists, and statisticians to ensure the graph reflects both domain knowledge and analytical feasibility. When performed before data collection, DAGs ensure that only necessary variables are captured, optimising both resources and study efficiency. In retrospective studies, DAG development remains equally important, although some confounders may be unavailable, increasing the risk that the minimally sufficient set cannot be fully achieved.
Step 2: Test the DAG—testable implications
Although DAGs encode assumptions, many of these assumptions imply testable implications, such as conditional independencies expected in the observed data. Analysts can evaluate these implications using statistical tests. While testing cannot prove a DAG is “correct,” it can highlight inconsistencies, helping refine assumptions.
Step 3: Apply appropriate causal methods
Once the DAG has identified the minimally sufficient adjustment set, analysts can deploy suitable causal tools such as:
- Regression adjustment
- Propensity score matching
- Inverse Propensity Treatment Wweighting (IPTW)
- Doubly robust estimators
Crucially, these tools should only be applied after the adjustment set has been validated through DAG reasoning.
Transparency and Reporting
Transparency is a cornerstone of high-quality RWE. DAGs serve as visual and conceptual tools that make analytical decisions explicit and reproducible. By illustrating the causal assumptions, DAGs allow analysts to:
- justify variable selection,
- demonstrate which paths are being blocked or left open, and
- clarify the theoretical basis of the causal model.
For reviewers, clear DAG reporting enhances understanding of the analytical logic. It helps assess whether confounders were appropriately chosen, whether the structure matches clinical expectations, and whether potential biases were acknowledged. Without DAGs, reviewers must infer causal assumptions from modelling choices alone, which can be ambiguous or misleading.
Explicit DAG-based reporting supports reproducibility, facilitates peer review, and strengthens the credibility of RWE in regulatory and clinical decision-making.
Concluding Remarks
Causal inference in RWE demands more than statistical sophistication—it requires a clear articulation of the causal structure underlying the data. DAGs provide the foundation for principled variable selection, enabling analysts to construct minimally sufficient adjustment sets that avoid unnecessary bias. Despite the availability of advanced causal tools such as IPTW, propensity score methods, and doubly robust estimators, these techniques are only as reliable as the assumptions that guide them.
Regulatory bodies increasingly recognise the importance of causal methods, yet explicit guidance on DAGs use remains limited. Bridging this gap would enhance methodological consistency and improve the reliability of RWE used in high-stakes decision-making.
The steps outlined—constructing DAGs, testing their implications, and applying appropriate causal methods—form a robust workflow for modern observational studies. As the role of RWE continues to expand, adopting DAG-centred causal workflows will be essential to generate credible, transparent, and interpretable evidence.
References
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference.
- Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If.
- Greenland, S., Pearl, J., & Robins, J. (1999). “Causal diagrams for epidemiologic research.” Epidemiology.
- Ankan A, Wortel IMN, Textor J.(2021). Testing graphical causal models using the R Package “dagitty.” Curr Protoc.
- FDA. Real-World Evidence Framework.
- EMA. Guideline on registry-based studies.
- NICE. Real-World Evidence Guidelines.