The structural effects of mutations can aid in differential phenotype prediction of beta-myosin heavy chain (Myosin-7) missense variants

NS Al-Numair, L Lopes, P Syrris, L Monserrat… - …, 2016 - academic.oup.com
Bioinformatics, 2016academic.oup.com
Motivation: High-throughput sequencing platforms are increasingly used to screen patients
with genetic disease for pathogenic mutations, but prediction of the effects of mutations
remains challenging. Previously we developed SAAPdap (Single Amino Acid Polymorphism
Data Analysis Pipeline) and SAAPpred (Single Amino Acid Polymorphism Predictor) that
use a combination of rule-based structural measures to predict whether a missense genetic
variant is pathogenic. Here we investigate whether the same methodology can be used to …
Abstract
Motivation: High-throughput sequencing platforms are increasingly used to screen patients with genetic disease for pathogenic mutations, but prediction of the effects of mutations remains challenging. Previously we developed SAAPdap (Single Amino Acid Polymorphism Data Analysis Pipeline) and SAAPpred (Single Amino Acid Polymorphism Predictor) that use a combination of rule-based structural measures to predict whether a missense genetic variant is pathogenic. Here we investigate whether the same methodology can be used to develop a differential phenotype predictor, which, once a mutation has been predicted as pathogenic, is able to distinguish between phenotypes—in this case the two major clinical phenotypes (hypertrophic cardiomyopathy, HCM and dilated cardiomyopathy, DCM) associated with mutations in the beta-myosin heavy chain (MYH7) gene product (Myosin-7).
Results: A random forest predictor trained on rule-based structural analyses together with structural clustering data gave a Matthews’ correlation coefficient (MCC) of 0.53 (accuracy, 75%). A post hoc removal of machine learning models that performed particularly badly, increased the performance (MCC = 0.61, Acc = 79%). This proof of concept suggests that methods used for pathogenicity prediction can be extended for use in differential phenotype prediction.
Availability and Implementation: Analyses were implemented in Perl and C and used the Java-based Weka machine learning environment. Please contact the authors for availability.
Contacts: andrew@bioinf.org.uk or andrew.martin@ucl.ac.uk
Supplementary information:  Supplementary data are available at Bioinformatics online.
Oxford University Press