Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models

HA Shihab, J Gough, DN Cooper, PD Stenson… - Human …, 2013 - Wiley Online Library
HA Shihab, J Gough, DN Cooper, PD Stenson, GLA Barker, KJ Edwards, INM Day
Human mutation, 2013Wiley Online Library
The rate at which nonsynonymous single nucleotide polymorphisms (ns SNP s) are being
identified in the human genome is increasing dramatically owing to advances in whole‐
genome/whole‐exome sequencing technologies. Automated methods capable of accurately
and reliably distinguishing between pathogenic and functionally neutral ns SNP s are
therefore assuming ever‐increasing importance. Here, we describe the Functional Analysis
Through Hidden M arkov Models (FATHMM) software and server: a species‐independent …
Abstract
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole‐genome/whole‐exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever‐increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state‐of‐the‐art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web‐based implementation of FATHMM, including a high‐throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.
Wiley Online Library