[HTML][HTML] Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads

Y Bai, M Ni, B Cooper, Y Wei, W Fury - BMC genomics, 2014 - Springer
Y Bai, M Ni, B Cooper, Y Wei, W Fury
BMC genomics, 2014Springer
Background Accurate HLA typing at amino acid level (four-digit resolution) is critical in
hematopoietic and organ transplantations, pathogenesis studies of autoimmune and
infectious diseases, as well as the development of immunoncology therapies. With the rapid
adoption of genome-wide sequencing in biomedical research, HLA typing based on
transcriptome and whole exome/genome sequencing data becomes increasingly attractive
due to its high throughput and convenience. However, unlike targeted amplicon sequencing …
Background
Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data.
Results
We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq.
Conclusions
PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications.
Springer