Improved integrative framework combining association data with gene expression features to prioritize Crohn's disease genes

K Ning, K Gettler, W Zhang, SM Ng… - Human molecular …, 2015 - academic.oup.com
K Ning, K Gettler, W Zhang, SM Ng, BM Bowen, J Hyams, MC Stephens, S Kugathasan
Human molecular genetics, 2015academic.oup.com
Genome-wide association studies in Crohn's disease (CD) have identified 140 genome-
wide significant loci. However, identification of genes driving association signals remains
challenging. Furthermore, genome-wide significant thresholds limit false positives at the
expense of decreased sensitivity. In this study, we explored gene features contributing to CD
pathogenicity, including gene-based association data from CD and autoimmune (AI)
diseases, as well as gene expression features (eQTLs, epigenetic markers of expression …
Abstract
Genome-wide association studies in Crohn's disease (CD) have identified 140 genome-wide significant loci. However, identification of genes driving association signals remains challenging. Furthermore, genome-wide significant thresholds limit false positives at the expense of decreased sensitivity. In this study, we explored gene features contributing to CD pathogenicity, including gene-based association data from CD and autoimmune (AI) diseases, as well as gene expression features (eQTLs, epigenetic markers of expression and intestinal gene expression data). We developed an integrative model based on a CD reference gene set. This integrative approach outperformed gene-based association signals alone in identifying CD-related genes based on statistical validation, gene ontology enrichment, differential expression between M1 and M2 macrophages and a validation using genes causing monogenic forms of inflammatory bowel disease as a reference. Besides gene-level CD association P-values, association with AI diseases was the strongest predictor, highlighting generalized mechanisms of inflammation, and the interferon-γ pathway particularly. Within the 140 high-confidence CD regions, 598 of 1328 genes had low prioritization scores, highlighting genes unlikely to contribute to CD pathogenesis. For select regions, comparably high integrative model scores were observed for multiple genes. This is particularly evident for regions having extensive linkage disequilibrium such as the IBD5 locus. Our analyses provide a standardized reference for prioritizing potential CD-related genes, in regions with both highly significant and nominally significant gene-level association P-values. Our integrative model may be particularly valuable in prioritizing rare, potentially private, missense variants for which genome-wide evidence for association may be unattainable.
Oxford University Press