Metformin delays neurological symptom onset in a mouse model of neuronal complex I deficiency

Complex I (also known as NADH-ubiquinone oxidoreductase) deficiency is the most frequent mitochondrial disorder present in childhood. NADH-ubiquinone oxidoreductase iron-sulfur protein 3 (NDUFS3) is a catalytic subunit of the mitochondrial complex I; NDUFS3 is conserved from bacteria and essential for complex I function. Mutations affecting complex I, including in the Ndufs3 gene, cause fatal neurodegenerative diseases, such as Leigh syndrome. No treatment is available for these conditions. We developed and performed a detailed molecular characterization of a neuron-specific Ndufs3 conditional KO mouse model. We showed that deletion of Ndufs3 in forebrain neurons reduced complex I activity, altered brain energy metabolism, and increased locomotor activity with impaired motor coordination, balance, and stereotyped behavior. Metabolomics analyses showed an increase of glycolysis intermediates, suggesting an adaptive response to the complex I defect. Administration of metformin to these mice delayed the onset of the neurological symptoms but not of neuronal loss. This improvement was likely related to enhancement of glucose uptake and utilization, which are known effects of metformin in the brain. Despite reports that metformin inhibits complex I activity, our findings did not show worsening a complex I defect nor increases in lactic acid, suggesting that metformin should be further evaluated for use in patients with mitochondrial encephalopathies.

Assure that all aspects of the Metabolon process are operating within specifications.

CMTRX
Pool created by taking a small aliquot from every customer sample.
Assess the effect of a non-plasma matrix on the Metabolon process and distinguish biological variability from process variability.
PRCS Aliquot of ultra-pure water Process Blank used to assess the contribution to compound signals from the process.

SOLV
Aliquot of solvents used in extraction.
Solvent Blank used to segregate contamination sources in the extraction.

RS Recovery Standard
Assess variability and verify performance of extraction and instrumentation.

IS Internal Standard
Assess variability and performance of instrument.

Figure 1. Preparation of client-specific technical replicates.
A small aliquot of each client sample (colored cylinders) is pooled to create a CMTRX technical replicate sample (multi-colored cylinder), which is then injected periodically throughout the platform run. Variability among consistently detected biochemicals can be used to calculate an estimate of overall process and platform variability. contains the retention time/index (RI), mass to charge ratio (m/z), and chromatographic data (including MS/MS spectral data) on all molecules present in the library. Furthermore, biochemical identifications are based on three criteria: retention index within a narrow RI window of the proposed identification, accurate mass match to the library +/-10 ppm, and the MS/MS forward and reverse scores between the experimental data and authentic standards. The MS/MS scores are based on a comparison of the ions present in the experimental spectrum to the ions present in the library spectrum. While there may be similarities between these molecules based on one of these factors, the use of all three data points can be utilized to distinguish and differentiate biochemicals. More than 3300 commercially available purified standard compounds have been acquired and registered into LIMS for analysis on all platforms for determination of their analytical characteristics. Additional mass spectral entries have been created for structurally unnamed biochemicals, which have been identified by virtue of their recurrent nature (both chromatographic and mass spectral). These compounds have the potential to be identified by future acquisition of a matching purified standard or by classical structural analysis.
Curation: A variety of curation procedures were carried out to ensure that a high quality data set was made available for statistical analysis and data interpretation. The QC and curation processes were designed to ensure accurate and consistent identification of true chemical entities, and to remove those representing system artifacts, mis-assignments, and background noise. Metabolon data analysts use proprietary visualization and interpretation software to confirm the consistency of peak identification among the various samples. Library matches for each compound were checked for each sample and corrected if necessary.
Metabolite Quantification and Data Normalization: Peaks were quantified using area-underthe-curve. For studies spanning multiple days, a data normalization step was performed to correct variation resulting from instrument inter-day tuning differences. Essentially, each compound was corrected in run-day blocks by registering the medians to equal one (1.00) and normalizing each data point proportionately (termed the "block correction"; Figure 2). For studies that did not require more than one day of analysis, no normalization is necessary, other than for purposes of data visualization. In certain instances, biochemical data may have been normalized to an additional factor (e.g., cell counts, total protein as determined by Bradford assay, osmolality, etc.) to account for differences in metabolite levels due to differences in the amount of material present in each sample. Statistical Calculations: For many studies, two types of statistical analysis are usually performed: (1) significance tests and (2) classification analysis. Standard statistical analyses are performed in ArrayStudio on log transformed data. For those analyses not standard in ArrayStudio, the programs R (http://cran.r-project.org/) or JMP are used. Below are examples of frequently employed significance tests and classification methods followed by a discussion of p-and q-value significance thresholds.

Welch's two-sample t-test
Welch's two-sample t-test is used to test whether two unknown means are different from two independent populations.
This version of the two-sample t-test allows for unequal variances (variance is the square of the standard deviation) and has an approximate t-distribution with degrees of freedom estimated using Satterthwaite's approximation. The test statistic is given by t= (  and n1, n2 are the samples sizes from groups 1 and 2, respectively. We typically use a twosided test (tests whether the means are different) as opposed to a one-sided test (tests whether one mean is greater than the other).

p-values
For statistical significance testing, p-values are given. The lower the p-value, the more evidence we have that the null hypothesis (typically that two population means are equal) is not true. If "statistical significance" is declared for p-values less than 0.05, then 5% of the time we incorrectly conclude the means are different, when actually they are the same.
The p-value is the probability that the test statistic is at least as extreme as observed in this experiment given that the null hypothesis is true. Hence, the more extreme the statistic, the lower the p-value and the more evidence the data gives against the null hypothesis.

q-values
The level of 0.05 is the false positive rate when there is one test. However, for a large number of tests we need to account for false positives. There are different methods to correct for multiple testing. The oldest methods are family-wise error rate adjustments (Bonferroni, Tukey, etc.), but these tend to be extremely conservative for a very large number of tests. With gene arrays, using the False Discovery Rate (FDR) is more common.
The family-wise error rate adjustments give one a high degree of confidence that there are zero false discoveries. However, with FDR methods, one can allow for a small number of false discoveries. The FDR for a given set of compounds can be estimated using the qvalue (see Storey J and Tibshirani R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100: 9440-9445; PMID: 12883005).
In order to interpret the q-value, the data must first be sorted by the p-value then choose the cutoff for significance (typically p<0.05). The q-value gives the false discovery rate for the selected list (i.e., an estimate of the proportion of false discoveries for the list of compounds whose p-value is below the cutoff for significance). For Table 1 below, if the whole list is declared significant, then the false discovery rate is approximately 10%. If everything from Compound 079 and above is declared significant, then the false discovery rate is approximately 2.5%. Table 1: Example of q-value interpretation

Random Forest
Random forest is a supervised classification technique based on an ensemble of decision trees (see Breiman L. (2001) Random Forests. Machine Learning. 45: 5-32; http://link.springer.com/article/10.1023%2FA%3A1010933404324). For a given decision tree, a random subset of the data with identifying true class information is selected to build the tree ("bootstrap sample" or "training set"), and then the remaining data, the "out-of-bag" (OOB) variables, are passed down the tree to obtain a class prediction for each sample. This process is repeated thousands of times to produce the forest. The final classification of each sample is determined by computing the class prediction frequency ("votes") for the OOB variables over the whole forest. For example, suppose the random forest consists of 50,000 trees and that 25,000 trees had a prediction for sample 1. Of these 25,000, suppose 15,000 trees classified the sample as belonging to Group A and the remaining 10,000 classified it as belonging to Group B. Then the votes are 0.6 for Group A and 0.4 for Group B, and hence the final classification is Group A. This method is unbiased since the prediction for each sample is based on trees built from a subset of samples that do not include that sample. When the full forest is grown, the class predictions are compared to the true classes, generating the "OOB error rate" as a measure of prediction accuracy. Thus, the prediction accuracy is an unbiased estimate of how well one can predict sample class in a new data set. Random forest has several advantages -it makes no parametric assumptions, variable selection is not needed, it does not overfit, it is invariant to transformation, and it is fairly easy to implement with R.
To determine which variables (biochemicals) make the largest contribution to the classification, a "variable importance" measure is computed. We use the "Mean Decrease Accuracy" (MDA) as this metric. The MDA is determined by randomly permuting a variable, running the observed values through the trees, and then reassessing the prediction accuracy. If a variable is not important, then this procedure will have little change in the accuracy of the class prediction (permuting random noise will give random noise). By contrast, if a variable is important to the classification, the prediction accuracy will drop after such a permutation, which we record as the MDA. Thus, the random forest analysis provides an "importance" rank ordering of biochemicals; we typically output the top 30 biochemicals in the list as potentially worthy of further investigation.

Hierarchical Clustering
Hierarchical clustering is an unsupervised method for clustering the data, and can show large-scale differences. There are several types of hierarchical clustering and many distance metrics that can be used. A common method is complete clustering using the Euclidean distance, where each sample is a vector with all of the metabolite values. The differences seen in the cluster may be unrelated to the treatment groups or study design.

Principal Components Analysis (PCA)
Principal components analysis is an unsupervised analysis that reduces the dimension of the data. Each principal component is a linear combination of every metabolite and the principal components are uncorrelated. The number of principal components is equal to the number of observations.
The first principal component is computed by determining the coefficients of the metabolites that maximizes the variance of the linear combination. The second component finds the coefficients that maximize the variance with the condition that the second component is orthogonal to the first. The third component is orthogonal to the first two components and so on. The total variance is defined as the sum of the variances of the predicted values of each component (the variance is the square of the standard deviation), and for each component, the proportion of the total variance is computed. For example, if the standard deviation of the predicted values of the first principal component is 0.4 and the total variance = 1, then 100*0.4*0.4/1 = 16% of the total variance is explained by the first component. Since this is an unsupervised method, the main components may be unrelated to the treatment groups, and the "separation" does not give an estimate of the true predictive ability.     Ph-mTOR/ mTOR Ph-mTOR/ mTOR Ph-p70 S6 Kinase (Thr389) Ph-p70 S6 Kinase (Thr389) Ph--p70 S6 Kinase (Ser371) Ph--p70 S6 Kinase (Ser371) Ph--4E-BP1 (Thr37/46