An efficient resampling method for calibrating single and gene-based rare variant association analysis in case–control studies

S Lee, C Fuchsberger, S Kim, L Scott - Biostatistics, 2016 - academic.oup.com
Biostatistics, 2016academic.oup.com
For aggregation tests of genes or regions, the set of included variants often have small total
minor allele counts (MACs), and this is particularly true when the most deleterious sets of
variants are considered. When MAC is low, commonly used asymptotic tests are not well
calibrated for binary phenotypes and can have conservative or anti-conservative results and
potential power loss. Empirical-values obtained via resampling methods are computationally
costly for highly significant-values and the results can be conservative due to the discrete …
Abstract
For aggregation tests of genes or regions, the set of included variants often have small total minor allele counts (MACs), and this is particularly true when the most deleterious sets of variants are considered. When MAC is low, commonly used asymptotic tests are not well calibrated for binary phenotypes and can have conservative or anti-conservative results and potential power loss. Empirical -values obtained via resampling methods are computationally costly for highly significant -values and the results can be conservative due to the discrete nature of resampling tests. Based on the observation that only the individuals containing minor alleles contribute to the score statistics, we develop an efficient resampling method for single and multiple variant score-based tests that can adjust for covariates. Our method can improve computational efficiency 1000-fold over conventional resampling for low MAC variant sets. We ameliorate the conservativeness of results through the use of mid--values. Using the estimated minimum achievable -value for each test, we calibrate QQ plots and provide an effective number of tests. In analysis of a case–control study with deep exome sequence, we demonstrate that our methods are both well calibrated and also reduce computation time significantly compared with resampling methods.
Oxford University Press