Estimating predicted probabilities from logistic regression: different methods correspond to different target populations

CJ Muller, RF MacLehose - International journal of epidemiology, 2014 - academic.oup.com
International journal of epidemiology, 2014academic.oup.com
Background: We review three common methods to estimate predicted probabilities following
confounder-adjusted logistic regression: marginal standardization (predicted probabilities
summed to a weighted average reflecting the confounder distribution in the target
population); prediction at the modes (conditional predicted probabilities calculated by setting
each confounder to its modal value); and prediction at the means (predicted probabilities
calculated by setting each confounder to its mean value). That each method corresponds to …
Abstract
Background: We review three common methods to estimate predicted probabilities following confounder-adjusted logistic regression: marginal standardization (predicted probabilities summed to a weighted average reflecting the confounder distribution in the target population); prediction at the modes (conditional predicted probabilities calculated by setting each confounder to its modal value); and prediction at the means (predicted probabilities calculated by setting each confounder to its mean value). That each method corresponds to a different target population is underappreciated in practice. Specifically, prediction at the means is often incorrectly interpreted as estimating average probabilities for the overall study population, and furthermore yields nonsensical estimates in the presence of dichotomous confounders. Default commands in popular statistical software packages often lead to inadvertent misapplication of prediction at the means.
Methods: Using an applied example, we demonstrate discrepancies in predicted probabilities across these methods, discuss implications for interpretation and provide syntax for SAS and Stata.
Results: Marginal standardization allows inference to the total population from which data are drawn. Prediction at the modes or means allows inference only to the relevant stratum of observations. With dichotomous confounders, prediction at the means corresponds to a stratum that does not include any real-life observations.
Conclusions: Marginal standardization is the appropriate method when making inference to the overall population. Other methods should be used with caution, and prediction at the means should not be used with binary confounders. Stata, but not SAS, incorporates simple methods for marginal standardization.
Oxford University Press