ViCBiostat Summer School 2015
DAY 1. Causal inference: concepts and methods
The past two decades has seen the emergence of a coherent theory of causal inference much of which is now being translated into practice. Causal concepts are increasingly being used in the design of studies and analysis of data from research projects in the health and social sciences. This workshop provides a broad overview of the main areas of causal inference and how they are being used in current practice. It will commence with causal diagrams as a mechanism for translating scientific research questions and background knowledge into appropriate data collection and modelling strategies. These form the background for estimating the effects of point-treatments or exposures using propensity scoring and inverse probability weighting methods, and for repeated exposures in the presence of time-dependent confounding using marginal structural models and G-computation. Examples of the data manipulation and statistical analysis required for each of these methods will be demonstrated.
Prerequisite knowledge for this workshop is a sound understanding of epidemiological and statistical concepts including multivariable regression models. No prior experience with causal concepts or methods is assumed and all data examples will be by demonstration rather than hands-on computing.
DAYS 2/3. Analysis of longitudinal and correlated data
This two-day workshop will provide a practically oriented introduction to a range of modern statistical methods that are commonly used for analysing longitudinal and correlated data from epidemiological or clinical studies (e.g. cluster randomised trials, longitudinal cohort studies). These methods include generalised estimating equations (GEEs) and generalised linear mixed-effects models. Participants will learn how to implement the methods discussed through a series of practical computing exercises with examples in Stata and R.
Longitudinal and correlated data arise in many settings in health and medical research. Common examples include studies involving repeated measurements of individuals over time, in clinical trials and cohort studies, and cluster-randomised trials where participants are clustered within natural units such as schools or medical practices. Appropriate analysis of such data needs to recognise the correlation that naturally arises between measurements within individuals (in longitudinal settings) or within clusters of individuals (cluster-based surveys or trials). In this course the concept of hierarchical data structures is developed, and appropriate statistical methods involving generalised estimating equations and linear mixed models are described and explored. We begin with models for continuous outcomes, based on normal distributions, and progress to categorical outcomes. Throughout, emphasis will be placed on interpretation issues focussing on the underlying clinical or public health research question.
DAY 4. Multiple imputation for missing data
Multiple imputation is an increasingly popular technique for handling incomplete data. This one-day course provides an introduction to multiple imputation and the practical issues faced by researchers wishing to apply this approach. In particular, the course focuses on understanding when multiple imputation is likely to produce substantial gains over simpler alternatives such as a complete-case analysis, and on the decisions faced when developing an imputation model. The implementation of multiple imputation and its potential benefits and limitations are illustrated with two case studies. We provide practical computing exercises on how to perform analyses using multiple imputation in Stata and R.
DAY 5. Prediction modelling
This workshop gives an introduction to topical issues in the use of prediction models in health. The use of both regression models and machine learning approaches to the process of prediction model development for individuals will be discussed and illustrated with practical computing exercises in Stata and R. The methods for model validation and evaluation of an updated model will be described including discussion of the controversial net reclassification index. Unfortunately A/Prof Manoj Gambhir is unavailable so there will not be an introduction to prediction modelling at the population level as previously advertised.
It is assumed that participants will have a sound working familiarity with Stata or R, and be familiar with multivariable regression methods, in particular logistic regression.