Constructing causal life course models: Comparative study of data-driven and theory-driven approaches

Research output: Contribution to journalJournal articleResearchpeer-review


  • Fulltext

    Accepted author manuscript, 445 KB, PDF document

Life course epidemiology relies on specifying complex (causal) models that describe how variables interplay over time. Traditionally, such models have been constructed by perusing existing theory and previous studies. By comparing data-driven and theory-driven models, we investigate whether data-driven causal discovery algorithms can help this process. We focus on a longitudinal dataset following a cohort of Danish men. The theory-driven models are constructed by two subject-field experts. The data-driven models are constructed by use of temporal Peter-Clark (TPC) algorithm. TPC utilizes the temporal information embedded in life course data. We find that the data-driven models recover some, but not all, causal relationships included in the theory-driven expert models. The data-driven method is especially good at identifying direct causal relationships that the experts have high confidence in. Moreover, in a post-hoc assessment we found that most of the direct causal relationships proposed by the data-driven model, but not included in the theory-driven model, were plausible. Thus, the data-driven model may propose additional meaningful causal hypothesis that are new or have been overlooked by the experts. In conclusion, data-driven methods can aid causal model construction in life course epidemiology, and combining both data-driven and theory-driven methods can lead to even stronger models.

Original languageEnglish
JournalAmerican Journal of Epidemiology
Issue number11
Pages (from-to)1917–1927
Number of pages11
Publication statusPublished - 2023

Bibliographical note

© The Author(s) 2023. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail:

Number of downloads are based on statistics from Google Scholar and

No data available

ID: 358674293