BACKGROUND. The effect of gene expression data on diagnosis remains limited. Here, we show how diagnosis and classification of growth hormone deficiency (GHD) can be achieved from a single blood sample using a combination of transcriptomics and random forest analysis. METHODS. Prepubertal treatment-naive children with GHD (n = 98) were enrolled from the PREDICT study, and controls (n = 26) were acquired from online data sets. Whole blood gene expression was correlated with peak growth hormone (GH) using rank regression and a random forest algorithm tested for prediction of the presence of GHD and in classification of GHD as severe (peak GH <4 μg/l) and nonsevere (peak ≥4 μg/l). Performance was assessed using area under the receiver operating characteristic curve (AUC-ROC). RESULTS. Rank regression identified 347 probe sets in which gene expression correlated with peak GH concentrations (r = ± 0.28, P < 0.01). These 347 probe sets yielded an AUC-ROC of 0.95 for prediction of GHD status versus controls and an AUC-ROC of 0.93 for prediction of GHD severity. CONCLUSION. This study demonstrates highly accurate diagnosis and disease classification for GHD using a combination of transcriptomics and random forest analysis. TRIAL REGISTRATION. NCT00256126 and NCT00699855. FUNDING. Merck and the National Institute for Health Research (CL-2012-06-005).
Philip G. Murray, Adam Stevens, Chiara De Leonibus, Ekaterina Koledova, Pierre Chatelain, Peter E. Clayton
Prediction of GH severity (peak GH ≤4 μg/l or >4 μg/l) via random forest model