[HTML][HTML] Analytical validation of whole exome and whole genome sequencing for clinical applications

MD Linderman, T Brandt, L Edelmann, O Jabado… - BMC medical …, 2014 - Springer
MD Linderman, T Brandt, L Edelmann, O Jabado, Y Kasai, R Kornreich, M Mahajan, H Shah
BMC medical genomics, 2014Springer
Background Whole exome and genome sequencing (WES/WGS) is now routinely offered as
a clinical test by a growing number of laboratories. As part of the test design process each
laboratory must determine the performance characteristics of the platform, test and
informatics pipeline. This report documents one such characterization of WES/WGS.
Methods Whole exome and whole genome sequencing was performed on multiple technical
replicates of five reference samples using the Illumina HiSeq 2000/2500. The sequencing …
Background
Whole exome and genome sequencing (WES/WGS) is now routinely offered as a clinical test by a growing number of laboratories. As part of the test design process each laboratory must determine the performance characteristics of the platform, test and informatics pipeline. This report documents one such characterization of WES/WGS.
Methods
Whole exome and whole genome sequencing was performed on multiple technical replicates of five reference samples using the Illumina HiSeq 2000/2500. The sequencing data was processed with a GATK-based genome analysis pipeline to evaluate: intra-run, inter-run, inter-mode, inter-machine and inter-library consistency, concordance with orthogonal technologies (microarray, Sanger) and sensitivity and accuracy relative to known variant sets.
Results
Concordance to high-density microarrays consistently exceeds 97% (and typically exceeds 99%) and concordance between sequencing replicates also exceeds 97%, with no observable differences between different flow cells, runs, machines or modes. Sensitivity relative to high-density microarray variants exceeds 95%. In a detailed study of a 129 kb region, sensitivity was lower with some validated single-base insertions and deletions “not called”. Different variants are "not called" in each replicate: of all variants identified in WES data from the NA12878 reference sample 74% of indels and 89% of SNVs were called in all seven replicates, in NA12878 WGS 52% of indels and 88% of SNVs were called in all six replicates. Key sources of non-uniformity are variance in depth of coverage and artifactual variants resulting from repetitive regions and larger structural variants.
Conclusion
We report a comprehensive performance characterization of WES/WGS that will be relevant to offering laboratories, consumers of genome sequencing and others interested in the analytical validity of this technology.
Springer