27 Jun 2012 10:00am

Missing data and risk prediction models


Disease risk prediction tools, often based on statistical regression models, are used for a variety of research and clinical purposes. They have a role in decision making for clinical treatment of patients, they can aid communication among patients, carers and treating health professionals, and they enable fair comparisons of performance between health care providers.

Development of risk prediction tools usually relies on imperfect datasets as provided by clinical disease registries or cohort studies. In this talk, motivated by examples in end stage renal disease, cardiac surgery and pneumonia, we focus on the problem of missing data in risk factors for disease when developing a new risk prediction tool or validating the use of an existing tool in a new setting. Using simulation, the approach of multiple imputation to dealing with missing data is compared with the default approach of most statistical software packages: complete case analysis. In a separate simulation study the approaches of multiple imputation, complete case and "missing as normal" are compared when the mechanism giving rise to missing data involves dependence on the unobserved values.

Results from these simulation studies will be presented and give rise to recommendations for dealing with missing risk factor data in development and validation of risk prediction tools.