Genomic variations in EBNA3C of EBV associate with posttransplant lymphoproliferative disorder

Epstein-Barr Virus (EBV) is a ubiquitous virus linked to a variety of lymphoid and epithelial malignancies. In solid organ and hematopoietic stem cell transplant recipients, EBV is causally associated with posttransplant lymphoproliferative disorder (PTLD), a group of heterogeneous lymphoid diseases. EBV+ B cell lymphomas that develop in the context of PTLD are generally attributed to the immunosuppression required to promote graft survival, but little is known regarding the role of EBV genome diversity in the development of malignancy. We deep-sequenced the EBV genome from the peripheral blood of 18 solid organ transplant recipients, including 6 PTLD patients. Sequences from 6 EBV+ spontaneous lymphoblastoid B cell lines (SLCL) were similarly analyzed. The EBV genome from PTLD patients had a significantly greater number of variations than EBV from transplant recipients without PTLD. Importantly, there were 15 nonsynonymous variations, including 8 in the latent cycle gene EBNA3C that were associated with the development of PTLD. One of the nonsynonymous variations in EBNA3C is located within a previously defined T cell epitope. These findings suggest that variations in the EBV genome can contribute to the pathogenesis of PTLD.


Introduction
Epstein-Barr Virus (EBV) is a gammaherpesvirus that has infected more than 90% of the world's population. Infection generally occurs via transmission through the saliva of a carrier and is typically asymptomatic but can cause infectious mononucleosis (IM) in adolescents and young adults. EBV is also linked to approximately 120,000 malignancies globally each year (1), and the factors that account for the distinct outcomes following infection remain unclear. EBV-associated malignancies are most commonly of B cell or epithelial cell origin, consistent with the propensity of the virus to infect these cell types, and include Burkitt lymphoma (BL), Hodgkin's lymphoma, non-Hodgkin's lymphomas, nasopharyngeal carcinoma (NPC), gastric adenocarcinoma (GC), and B cell lymphomas in immunocompromised or immunosuppressed individuals. In hematopoietic stem cell and solid organ transplant recipients, EBV + B cell lymphomas fall within a spectrum of abnormal lymphoproliferations termed posttransplant lymphoproliferative disorder (PTLD).
The 172-kb double-stranded DNA genome of EBV includes genes that are coordinately expressed during the lytic phase of infection that yields new viral particles, as well as during the latent phase of infection that promotes survival of the host cell and persistence of the viral genome. Distinct groups of latency genes that contribute to transformation and cellular proliferation are expressed in EBV-associated malignancies. EBV + B cell lymphomas in PTLD are characterized by expression of the EBV-encoded small RNAs (EBER) EBNA1, EBNA2, EBNA3A, EBNA3B, EBNA3C, EBNA-LP, LMP1, and LMP2. The latency genes are also important targets of the adaptive immune response to EBV (2). In particular, viral-specific T cells become activated and proliferate following T cell receptor (TCR) recognition of EBV-derived peptides presented by the MHC proteins expressed on antigen presenting cells. Once activated, CD4 + T cells produce cytokines to support antibody production and cytotoxic T cell (CTL) proliferation and differentiation to mediate elimination of virally infected cells. In the case of EBV + PTLD, the immunosuppression required to prevent graft rejection is thought to impair the CD4 + and CD8 + T cells Epstein-Barr Virus (EBV) is a ubiquitous virus linked to a variety of lymphoid and epithelial malignancies. In solid organ and hematopoietic stem cell transplant recipients, EBV is causally associated with posttransplant lymphoproliferative disorder (PTLD), a group of heterogeneous lymphoid diseases. EBV + B cell lymphomas that develop in the context of PTLD are generally attributed to the immunosuppression required to promote graft survival, but little is known regarding the role of EBV genome diversity in the development of malignancy. We deep-sequenced the EBV genome from the peripheral blood of 18 solid organ transplant recipients, including 6 PTLD patients. Sequences from 6 EBV + spontaneous lymphoblastoid B cell lines (SLCL) were similarly analyzed. The EBV genome from PTLD patients had a significantly greater number of variations than EBV from transplant recipients without PTLD. Importantly, there were 15 nonsynonymous variations, including 8 in the latent cycle gene EBNA3C that were associated with the development of PTLD. One of the nonsynonymous variations in EBNA3C is located within a previously defined T cell epitope. These findings suggest that variations in the EBV genome can contribute to the pathogenesis of PTLD.
responsible for controlling the expansion of EBV-infected B cells. Whether EBV genome diversity impacts host immunity or contributes to the development of EBV + PTLD is unknown.
The first EBV genome sequenced was derived from the B95-8 strain that originated from a patient with IM. Since then, sequencing studies delineated EBV variation on the basis of distinct strains, geography, and cancer phenotypes (3)(4)(5)(6)(7)(8)(9). The sequencing of EBNA3s and EBNA2 from a variety of EBV + cell lines led to the development of the type 1 and type 2, also known as type A and type B, classification scheme of EBV strains (10). Another scheme defined the EBV sequence variants Alaskan, China 1, China 2, China 3, Med, and North Carolina (NC) based on differences within LMP1 (11). With the advent of deep-sequencing, EBV genomes have now been sequenced from in vitro transformed EBV + lymphoblastoid B cell lines (LCL), EBV-associated tumors, and saliva of healthy individuals allowing broader analysis of the genome (12). Palser et al. (9) conducted a whole genome sequencing study of EBV isolated from tumors and cell lines, as well as one healthy donor, that focused primarily on geographic differences. EBV genome variations have been reported in viruses isolated from patients with IM, EBV malignancies such as NPC, gastric cancer, lung carcinoma, NK/T cell lymphoma, and PTLD, as well as patient-derived cell lines (4)(5)(6)(7)(8)(9)(13)(14)(15)(16)(17)(18)(19). However, it has not been possible to establish specific variations characteristic of EBV-associated malignancies due to the paucity of EBV genomes from control subjects without EBV disease. Moreover, several of these studies examined EBV from cancer samples obtained from multiple geographic locations; thus, it is still unclear whether the variations are linked to disease or location (3). For instance, a variation in EBNA1 originally thought to be linked to the disease BL is instead likely linked to a geographic location -in this case, Africa (20). In this study, we performed the first deep-sequencing of the EBV genome from the blood of transplant recipients with EBV + PTLD and PTLD-free transplant recipients to identify genomic variations associated with the cancer phenotype.

Results
EBV quantitation and whole genome sequencing. EBV viral loads were determined from the blood of pediatric transplant recipients (n = 18) (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/jci.insight.131644DS1) by quantitative PCR (qPCR) targeting the EBNA1 gene (Supplemental Figure 1). Transplant recipients with PTLD (n = 6) had a viral load of 3165 ± 1956 (mean ± SEM) copies per μg of DNA. Transplant recipients without PTLD were divided into 2 groups based on EBV viral load, EBV hi (>1000 copies per μg DNA) and EBV lo (<1000 copies per μg DNA). EBV hi samples had a mean viral load of 4946 ± 3683 copies per μg of DNA, and EBV lo samples had a mean viral load of 164 ± 212 copies per μg of DNA. The EBV copy numbers were significantly higher in the PTLD group (P < 0.004) and the EBV hi group (P < 0.01) compared with the EBV lo group. Due to the relatively low abundance of EBV DNA compared with human DNA in the peripheral blood, the EBV genome was amplified by PCR primers that targeted nonrepetitive regions to create overlapping amplicons for whole genome sequencing (Supplemental Table 2) (13). A total of 97 × 10 6 sequence reads of 150-bp end reads were generated from the combined pool of amplicons, with an average of 3 × 10 6 sequence reads per sample. A total of 39%-85% of the reads mapped to the EBV genome (Table 1). All genomes were assembled into large contiguous DNA sequences using de novo assembly and did not include the repeat regions. The mean percent GC content for all groups was 59.5%, and the average read depth coverage for all samples was 2645-fold. The mean percent coverage of the EBV genome was 83.75% (range 73%-94%). Higher EBV viral load correlated to greater coverage of the genome. Thus, 18 potentially novel whole EBV genomes isolated from PTLD patients, EBV hi , and EBV lo transplant recipients were sequenced from peripheral blood.
Regions of increased variation are present in EBV genomes isolated from PTLD patients compared with transplant recipients without PTLD. EBV genomes sequenced from healthy individuals that were previously published (9,18) were used as a control group in order to compare the number of variations in the EBV genome of transplant recipients to nontransplanted, healthy individuals. When compared with the reference EBV genome AJ507799, composed of a combination of sequences from B95-8 and Raji, the EBV genomes from all samples had a mean of 693 total variations. EBV genomes sequenced from the PTLD group were the most variable, with a mean of 1282 single nucleotide variations (SNV), while the EBV genomes from the control group were the least variable, with a mean of 807 SNV (Table 1). EBV genomes from spontaneous B lymphoblastoid cell lines (SLCL) derived from a second, distinct cohort of PTLD subjects had the highest number of multiple nucleotide variations (MNV) and Indel (insertions and deletions) -128 and 366, respectively -while the control group had the lowest number of MNV and Indel -16 and 30, respectively.
To study the variability across the EBV genomes and the location of hotspots, the number of variations within a 1000 nucleotide sliding window for each sample group was mapped to the EBV genome from 0 bp to 172,000 bp ( Figure 1A). Variations included insertions, deletions, SNV and multiple sequential single nucleotide variations (MSV). The EBV genome contains regions of high diversity, and there were differences in the number of variations between the sample groups. We compared the number of variations within the 1000 nucleotide segments in the PTLD, EBV hi , and EBV lo groups using a 2-way ANOVA with the Tukey post hoc test and identified 11 regions, designated a-k, with significant differences between groups. Genes that contain 1000 nucleotide segments with a significantly greater number of variations in the PTLD group than the EBV hi and/or EBV lo group include EBNA3B ( Figure  1B Figure 2, region a), and it had more variations in the EBNA3C gene than the EBV hi group (Supplemental Figure 2, regions d-f). Overall, the significant difference in the number of variations within the PTLD group compared with the EBV lo and EBV hi groups indicates that there may be selective pressure in specific regions of the EBV genome associated with PTLD.
Latency and tegument genes contain the highest number of synonymous and nonsynonymous variations. The number of variations for each sample group was determined in the following gene categories: latency, tegument, membrane glycoprotein, unknown, replication, packaging, capsid, transcription factor, and nucleotide metabolism. Across all sample groups, the latency and tegument genes had the highest number of variations, while transcription factors and nucleotide metabolism genes had the lowest number of variations ( Figure 2A). The mean number of nonsynonymous variations within the groups was similar, and there were no significant differences across groups ( Figure 2B, left). Similarly, the number of synonymous variations was not significantly different across the groups ( Figure 2B, right). Overall for all the groups combined, the EBV coding regions had 172 ± 6 (mean ± SEM) nonsynonymous and 273 ± 15 synonymous variations ( Figure 2B). The number of nonsynonymous variations in each gene category was not correlated to the number of genes in each category ( Figure 2C). However, when the number of nonsynonymous variations was normalized to the total length of the genes in each gene category, the latency and tegument genes had a higher ratio of nonsynonymous variations to gene length compared with all other gene categories ( Figure 2D). EBV genomes isolated from PTLD patients have significantly more nonsynonymous variations in EBNA3C compared with EBV genomes isolated from PTLD-free transplant recipients. Next, we analyzed the variability of the latency genes expressed in EBV + B cell lymphomas in PTLD. EBNA1, EBNA3C, and LMP1 had the greatest mean number of nonsynonymous variations ( Figure 3A). EBV genomes in the SLCL group had significantly more nonsynonymous variations within EBNA1 than the EBV hi and EBV lo groups (P = 0.04 and P = 0.04, respectively), but there was no difference compared with the PTLD group ( Figure 3B). However, PTLD-derived EBV genomes had significantly more nonsynonymous variations in EBNA3C compared with genomes from the EBV hi (P = 0.05), EBV lo (P = 0.03), and SLCL groups (P = 0.02) ( Figure 3F). There were no differences in the number of nonsynonymous variations in EBNA2 ( Figure 3C), EBNA3A ( Figure 3D), EBNA3B ( Figure 3E), LMP1 ( Figure 3G), or LMP2 ( Figure 3H) across the groups. These findings indicate that genomic variations resulting in amino acid changes in the EBNA3C protein are particularly prevalent in EBV isolated from PTLD patients. Phylogenetic analysis of whole EBV genomes and the latent genes EBNA1, EBNA3B, and EBNA3C. Phylogenetic trees were constructed to visualize the impact of variations on the relationship between whole EBV genomes and the latency genes within the groups. Nucleotide sequences were aligned using Multiple Sequence Comparison by Log-Expectation (MUSCLE) alignment, and trees were constructed using neighbor-joining model and Jukes-Cantor genetic distance modeling. Phylogenetic analysis of the entire EBV genome did not reveal any defined cluster by sample group ( Figure 4A). However, phylogenetic analysis of EBNA1 genes ( Figure 4B) showed a cluster of EBV hi and EBV lo samples and a grouping of control samples, whereas EBNA3B ( Figure 4C) and EBNA3C ( Figure 4D) gene analysis produced groupings of , PTLD (dark blue, n = 6), EBV hi (light green, n = 6), EBV lo (dark green, n = 6), and control (pink, n = 4) groups. Gene locations are shown below: latent genes (red), lytic genes (gray), and noncoding regions (white). Segments (1000 nt) that contain significant differences in number of variations between groups are lettered a-k. (B) Segments (1,000 nt) of the EBV genome that contain significant differences in number of variations between PTLD and EBV hi and/or EBV lo groups. Statistical significance between number of variations from PTLD, EBV hi , and EBV lo groups was determined using 2-way ANOVA with Tukey's multiple comparisons test. Lines on the bottom indicate significant comparisons: PTLD versus EBV lo , PTLD versus EBV hi , PTLD versus EBV lo , and PTLD versus EBV hi . Region a, includes part of EBNA2 gene, PTLD versus EBV lo , P = 0.0009; region i, within BBLF4 gene, PTLD versus EBV lo , P = 0.001; Region j, includes part of BDLF3 and BDLF2 genes, PTLD versus EBV lo , P = 0.007; Region b, within BPLF1 gene PTLD versus EBV hi , P = 0.02; Region e, within EBNA3C gene, PTLD versus EBV hi , P < 0.0001; Region f, within EBNA3C gene, PTLD versus EBV hi , P < 0.0001; Region g, includes part of EBNA3C, BZLF2, and BZLF1 genes, PTLD versus EBV hi , P = 0.04; Region h, includes part of BZLF1 gene, PTLD versus EBV hi , P = 0.04; Region k, within a noncoding region, PTLD versus EBV hi , P = 0.02; Region c, includes part of EBNA3B and EBNA3C gene, PTLD versus EBV lo , P < 0.0001 and PTLD versus EBV hi , P < 0.0001; Region d, within EBNA3C gene, PTLD versus EBV lo , P < 0.0001, PTLD versus EBV hi , P < 0.0001. PTLD and SLCL samples and a grouping of EBV hi and EBV lo samples. The other latency genes, EBNA2, EBNA3A, LMP1, and LMP2, did not generate group-specific clades (Supplemental Figure 3). Thus, there are variations within the EBNA3B and EBNA3C genes that are found more frequently in EBV from PTLD and SLCL samples compared with EBV from EBV hi and EBV lo samples.
We next applied 2 well-established EBV classification schemes to evaluate the relationship of the EBV genomes isolated from PTLD subjects and transplant recipients. Using the scheme of Edwards et al. (11), which focuses on LMP1 sequences, we determined that 3 of the 6 PTLD samples are China 1, while 1 each are NC, B95-8, and Med. A phylogenetic analysis of LMP1 sequences indicates that the China 1 and NC strains form a distinct cluster from the B95-8 and Med strains (Supplemental Figure 3C) such that the majority of the PTLD samples (4 of 6) group together in the clade that includes the China 1 and NC strains. Using the type 1 and type 2 classification scheme (10) that is based on EBNA2 and EBNA3 sequence diversity, we determined that 6 of 6 PTLD EBV genomes and 6 of 6 EBV hi EBV genomes were of the type 1 strain, while 5 of 6 EBV genomes from the EBV lo group were type 1 and the sixth was type 2.
Identification of nonsynonymous variations associated with PTLD. To determine the variations in the EBV genome associated with PTLD, we performed Fisher's exact test. We identified 15 nonsynonymous variations that were significantly associated with the PTLD group compared with the EBV lo and EBV hi groups. A heatmap with unsupervised analysis was used to visualize the distribution of the PTLD-associated variations in all the samples ( Figure 5). The clustering resulted in distinct grouping of the PTLD samples from the EBV hi and EBV lo samples ( Figure 5A and Table 2). There were no samples in the EBV hi or EBV lo groups that contained an equal number or more PTLD-associated variations than any PTLD sample, indicating the strength of the association of these variations. Moreover, the samples within the PTLD group had significantly more PTLD-associated variations (range 6-12) than the EBV hi (range 0-3) (P = 0.0007) and EBV lo (range 0-3) (P = 0.0006) groups ( Figure 5B). The PTLD-associated variations were identified in latency (EBNA3C and LMP2A), membrane glycoprotein (BDLF3), tegument (BDLF2 and BPLF1), and transcription factor (BARF1 and BMLF1) genes ( Figure 5A), with the majority in the latency genes (EBNA3C, n = 8; LMP2, n = 1). In contrast, 2 genes, BOLF1 (tegument) and BLLF3 (nucleotide metabolism), contained non-PTLD-associated (EBV hi and EBV lo ) variations. Nine of the PTLD-associated variations, including 6 within EBNA3C, were also present in the SLCL group ( Figure 5C). Thus, we identified a series of nucleotide variations resulting in amino acid changes in EBV-encoded proteins, some of which are restricted to transplant recipients that develop PTLD.
The PTLD-associated variation (Arg887Thr) is located within a defined EBNA3C peptide epitope and decreases the predicted binding affinity of the peptide to HLA-B7. To determine whether these protein sequence changes could affect immune recognition of EBV, we compared the PTLD-associated variations to previously established CD8 + EBV-specific T cell epitopes (21). Two variations were identified within known T cell epitopes, an arginine to threonine change at position 887 of EBNA3C (EBNA3CArg887Thr) located within the HLA-B7 epitope QPRAPIRPI, and a serine to threonine change at position 444 of LMP2A (LMP2A Ser444Thr) located within the HLA-A25 epitope VMSNTLLSAW. Using NetMHC 4.0 Server, we calculated the predicted binding affinity concentration to HLA-B7 of the reference peptide QPRAPIRPI of EBNA3C and the PTLD-associated variant EBNA3CArg887Thr ( Figure 6A). The EBNA3CArg887Thr change resulted in an increase in binding affinity concentration for QPRAPITPI (13.04 nM) compared with the reference epitope QPRAPIRPI (8.84 nM). Since the binding affinity concentration is inversely proportional to binding affinity, the PTLD-associated variation causes a predicted decrease in binding affinity of the peptide to HLA-B7. Using PyMOL, we modeled the reference QPRAPIRPI and PTLD-associated variation QPRAPITPI in the HLA-B7 binding groove ( Figure 6, B and C). For the PTLD-associated epitope, there was a loss of 1 hydrogen bond, which may contribute to the decreased binding affinity caused by the PTLD-associated variation.
PTLD-associated variations alter binding of predicted epitopes to HLA-A, -B, and -DR alleles. NetMHC and NetMHCII were used to determine whether the remaining PTLD-associated variations were located within predicted T cell epitopes and how the PTLD-associated variation could affect the binding of the epitope (Figure 7, Table 3, Table 4, Table 5, and Table 6). An amino acid sequence was obtained from the reference genome AJ507799 that contained the location of each PTLD-associated variation. NetMHC4.0 and NetMHCII were used to predict 9mer and 15mer peptides, respectively, that contained the variable amino acid with a sliding window and the corresponding HLA-A, -B, and -DR alleles. NetMHC predicted 85 epitopes for HLA-A alleles and 140 epitopes for HLA-B alleles for the reference sequence. NetMHCII predicted 91 epitopes derived from the reference sequence for HLA-DR alleles. Reference peptide epitopes were predicted to be strong binders (SB), weak binders (WB), or nonbinders (NB). To determine the impact on the binding affinity concentration of the epitopes, the SB and WB peptides were analyzed with the PTLD-associated variation (Figure 7). When the PTLD-associated variations were present, 4 distinct predicted changes in binding status of the epitopes were observed (SB to WB, WB to NB, WB to SB, and SB to NB). Twenty-five peptides showed a decrease in binding (SB to WB) (Figure 7, A, D, and H), while 6 SB peptides became NB peptides ( Figure 7G) and 76 WB peptides became NB peptides (Figure 7, B, E, and I). Additionally, potential tumor neoantigens were identified, as 17 WB peptides became predicted SB peptides when the PTLD-associated variations were present (Figure 7, C, F, and J). HLA typing was performed on DNA from subject samples (Supplemental Table 3), and several PTLD-associated variants were present in individuals carrying the particular HLA allele that the PTLD variant epitope is predicted to bind (Table 3, Table 4, Table 5, and Table 6).

Discussion
Full genome sequencing of EBV offers insights into the underlying pathogenesis of EBV-associated malignancies and the impact of viral diversity on the host immune response. In this report, we provide the first genome wide analysis to our knowledge of EBV isolated from the blood of transplant recipients with and without PTLD. We identified 15 PTLD-associated nonsynonymous variations resulting in amino Mean ± SEM are shown for each group. Statistical significance between number of PTLD-associated variations in the PTLD, EBV hi , and EBV lo groups was determined using unpaired 2-tailed t test with P value. PTLD versus EBV hi , P < 0.0001; PTLD versus EBV lo , P < 0.0001. (C) Presence of PTLD-associated variations in SLCL samples. ****P < 0.0001.
acid changes, including 8 in EBNA3C. Overall, the EBNA3C sequences isolated from the PTLD samples had significantly more nonsynonymous variations than EBNA3C sequenced from PTLD-free transplant recipients. EBNA3C is a latent cycle EBV gene that is essential for B cell transformation and acts as a transcriptional regulator through interaction with a variety of cellular DNA binding proteins to mediate activation, or repression, of cellular and viral genes (22)(23)(24)(25)(26)(27)(28)(29). EBNA3C is also an important target of the T cell response during latent infection, with as many as 4% of CD8 + T cells specific for an immunodominant peptide of EBNA3C (30). Previous full genome sequencing studies of EBV in PTLD were restricted to cell lines derived from PTLD patients (9) or FFPE PTLD specimens (16) and did not include non-PTLD transplant controls. Consistent with our findings, Dharnidharka et al. (16) also found that EBNA3C, as well as EBNA1 and EBNA-LP, isolated from FFPE specimens had increased numbers of variations compared with other EBV genes; however, the specific variations were not reported. Interestingly, the Ala-683Val and Glu701Gln variations in EBNA3C that we identified from the blood of PTLD subjects were also identified in some PTLD-derived B cell lines (9). It will be important in future studies to compare variations in the viral genome isolated from the blood directly with those in the tumor. We also identified PTLD-associated variations in the latency gene LMP2A and a series of lytic genes including the tegument genes BPLF1, BDLF2, and BOLF1; the transcription factors BARF1 and BMLF1; the membrane glycoprotein gene BDLF3; and the nucleotide metabolism gene BLLF3. Of these genes, BDLF2 and BDLF3 had significantly more variations in the PTLD group than the EBV hi or EBV lo groups. To our knowledge, these are the first reported EBV lytic gene sequences from PTLD patients and transplant recipients.
The prevailing paradigm has been that PTLD arises primarily as a consequence of immunosuppression-induced deficiencies in EBV-specific T cells. However, all transplant recipients are required to take immunosuppression, but only a subset develop EBV + PTLD, indicating that other factors are also at play. Our results suggest that variations in immunogenic epitopes of viral proteins also contribute to altered host immunity to EBV and promote escape from the immune response. Thus, these findings constitute the basis for a potentially novel mechanism of pathogenesis contributing to the immune impairment that underlies EBV + PTLD. Here, we focused on the impact of PTLD-associated variations on the binding of EBV-derived peptides to MHC proteins for presentation to EBV-specific T cells. Two of the PTLD-associated variations we identified, EBNA3C Arg887Thr (HLA-B7) and LMP2A Ser444Thr (HLA-A25), are found within previously defined T cell epitopes. We were able to model the binding of the EBNA3C PTLD-associated variation QPRAPIRPI to HLA-B7. Computational analysis indicated that the variation caused a decrease in HLA binding compared with the reference peptide QPRAPITPI. Other PTLD-associated variations that have not yet been established as T cell epitopes were analyzed to determine how these variations might affect binding to HLA class I and class II molecules for antigen presentation. The majority of the EBV-derived peptides from the PTLD group caused a predicted decrease, or loss, in HLA binding, suggesting that the PTLD-associated variations could impact antigen presentation and subsequent T cell response, thereby increasing the risk for PTLD. Moreover, we were able to identify several PTLD-associated variations in PTLD subjects that were predicted to bind to specific HLA alleles that matched the HLA of the subject carrying the variant epitope. These data suggest that the PTLD-associated variants may impact the immune response to EBV and potentially contribute to development of disease. Along these lines, Tao et al. (31) found that CTL generation was significantly diminished when peptides derived from EBNA3C variants were used to stimulate T cells compared with control peptides, while Gottschalk et al. (32) found that adoptive transfer of ex vivo EBV-specific CTL was ineffective because the major activity of the CTL line was directed against an HLA-B restricted epitope of EBNA3B that had been deleted in the EBV found in the tumor. In addition, White et al. (33) reported that LCL produced with EBV lacking EBNA3B were more tumorigenic in humanized mice than LCL made with WT EBV, and other examples of EBNA3B alterations were identified in clinical PTLD samples. Finally, some EBV peptides from the PTLD group had predicted increases in HLA binding when compared with the reference peptides. These variant peptides may constitute tumor neoantigens that could be exploited to augment cellular immunotherapy approaches. Using previously established classification schemes for EBV strains, we found that all EBV genomes isolated from the blood of PTLD patients were type 1, consistent with prior reports that type 1 is prevalent in EBV isolated from FFPE specimens of PTLD patients (16,34). In contrast, 4 of the 6 strain types defined by LMP1 diversity were represented in PTLD-derived EBV genomes (11).
This study constitutes an important advance in our understanding of EBV genome diversity in health and disease and provides the basis for the application of immunogenomics to PTLD. By comparing the EBV genome from transplant recipients with and without PTLD, we discovered amino acid changes within several genes, including EBNA3C, associated with PTLD. Moreover, we determined that these variations alter the predicted binding of EBV-derived peptides to HLA-A, -B, and -DR proteins, elucidating a potential mechanism of reduced immune presentation of EBV epitopes that may promote immune escape in PTLD. These PTLD-associated variations provide insight into how cancer-specific EBV variations alter the immune response to EBV, identify potential biomarkers that may help identify risk for PTLD development, and reveal possible tumor neoantigens for cellular immunotherapy.

Methods
Patient samples, cell lines, and DNA isolation. DNA was isolated from 0.5 mL of blood obtained from pediatric liver transplant recipients that had detectible EBV by semi-qPCR (33) but did not develop PTLD (n = 12). The 12 patients were separated into 2 groups, EBV hi and EBV lo , based on their EBV viral loads in whole blood determined by qPCR as described below. EBV hi samples were defined as > 1000 copies/ μg of DNA, and EBV lo samples were defined as ≤ 1000 copies/μg of DNA. DNA was also isolated from the blood of 6 pediatric liver and/or kidney transplant recipients within 1 month of biopsy-proven PTLD, as previously described (35). Lastly, DNA was isolated from 6 EBV + SLCL derived from a separate set of 6 PTLD patients (36). The QIAamp DNA Mini Kit (Qiagen) was used to isolate DNA according to manufacturer's recommendations.
Quantification of EBV. EBV viral load quantification was determined using the artus EBV PCR kit, which targets the EBNA1 gene (Qiagen). PCR was performed according to the manufacturer's instructions, except the total reaction mixture was scaled down to 30 μL. For quantification, a standard curve was (A-C) HLA-A predicted epitopes for which the PTLD-associated variation changed the binding from strong to weak (A); weak to no binding (B), reference versus PTLD peptides, P = 0.009; and weak to strong (C). (D-G) HLA-B predicted epitopes for which the PTLD-associated variation changed the binding from strong to weak (D), reference versus PTLD peptides, P = 0.006; weak to no binding (E), reference versus PTLD peptide, P < 0.0001; weak to strong (F), reference versus PTLD peptide, P = 0.02; strong to no binding (G), reference versus PTLD peptide, P = 0.05. (H-J) HLA-DR predicted epitopes for which the PTLD-associated variation changed the binding from strong to weak (H), reference versus PTLD peptide, P = 0.02; weak to no binding (I), reference versus PTLD peptide, P = 0.0005; weak to strong (J), reference versus PTLD peptide, P = 0.005. (A-J) Statistical significance between number of reference peptide and PTLD peptide binding affinities was determined using paired 2-tailed t test with P value. *P < 0.05, **P < 0.01, ****P < 0.0001. created with a 10-fold dilution (24,000-2.4 copies of EBV per μL) of the first WHO International Standard EBV for Nucleic Acid Amplification from National Institute for Biological Standards and Control (Hertfordshire, United Kingdom). Additionally, a 10-fold dilution series (500,000-5 copies per μL) of EBV Positive Control DNA (Qiagen) was used. EBV viral load was calculated by using the EBV copy number, as determined by qPCR per μg of DNA of each sample.
PCR amplification of whole EBV genome. The EBV genome, excluding repeat regions, was amplified from SLCL DNA using 59 PCR primer pairs described by Kwok et al. (13). Due to a lower EBV copy number in patient blood samples compared with SLCL, 79 additional primer pairs (Table 1, Supplemental Table 2) were designed to amplify the same region of the EBV genome as the Kwok et al. (13) primers and produce Bolded rows indicate cases where the specific EBV variation and the presenting HLA allele were identified in the same PTLD sample or SLCL. SB, strong binder; WB, weak binder; NB, nonbinder.
overlapping amplicons, ranging from 1019-2498 bp in length. To amplify the EBV genome, 100 ng of DNA was added to a 100 μL reaction mixture with 10 μL HotstarTaq Plus 10× PCR buffer, 20 μL HotstarTaq Plus Q solution, 0.5 μL HotstarTaq Plus enzyme (Qiagen), 2 μL of 10 mM deoxynucleotide triphosphate (dNTPs) (Thermo Fisher Scientific), and 2.4 μL of 25 μM forward and reverse primers (Protein and Nucleic Acid Facility). PCR was conducted in an iCycler Thermal Cycler (Bio-Rad) with the following conditions: enzyme activation at 95°C for 5 minutes, 50 cycles of denaturation at 94°C for 45 seconds, annealing at 56°C for 45 seconds, and extension at 72°C for 1 minute per 1 kb of product.
Next-generation sequencing. PCR products were visualized by loading a mixture of 45 μL of PCR product and 9 μL of 6× Blue Gel Loading Dye (New England BioLabs) on a 2% TopVision agarose gel (Thermo Fisher Scientific) with 20 μL of GeneRuler 1 kb Plus DNA Ladder (Thermo Fisher Scientific). PCR product (50 μL) was then purified using QIAquick PCR Purification Kit (Qiagen) per manufacturer's instructions. The DNA concentration was determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific). The PCR products for each sample were normalized to equal molecular quantity, pooled, and sequenced at the Stanford Functional Genomics Facility. A quality check was performed on the samples using the Agilent High Sensitivity DNA Kit and Agilent 2100 bioanalyzer, and only samples that were successfully fragmented to 150 bp were used. Barcodes and standard Illumina indexed adapters were added using the KAPA DNA library preparation kit for Illumina. The library was then sequenced on an Illumina MiSeq Next Generation Sequencer at the Stanford Functional Genomics Facility. HLA typing was performed by extended/full gene next-generation sequencing as described previously (37).  EBV genome sequence analysis. Barcodes were trimmed from the reads, and low-quality reads (quality score below 20) were discarded from the data using Fasta packages. The reads were mapped to the reference AJ507799 EBV genome and assembled using Genome Analysis Toolkit (GATK) (Broad Institute) and Burrow Wheeler Aligner (38); insertions and deletions (Indels) were determined using GATK. SNV and base call filtering was conducted using SAMtools (39). SNVs with low scores were removed from the data based on low GenCall and Cluster Separation scores. The number of variations in 1000 nucleotide segments was calculated for each sample group. Two-way ANOVA was used to determine which 1000 nucleotide segment had significant differences in number of variations among the PTLD, EBV hi , and EBV lo groups. The number of variations in each gene category was calculated. The numbers of nonsynonymous and synonymous variations in each sample group were determined using CLC Genomics Workbench (Qiagen), and differences were analyzed by 2-tailed Student's t test. The number of nonsynonymous variations in each gene category was divided by the sum of gene lengths (reference AJ507799 was used to calculate gene lengths) in each category. Two-way ANOVA with the Tukey post hoc test was used to determine the significance in the number of nonsynonymous variations between the groups. Phylogenetic trees were constructed from specific genes, segments of the genome, and the whole EBV genome for all samples by aligning with MUSCLE and constructing trees using neighbor-joining model and Jukes-Cantor genetic distance modeling with CLC Genomics Workbench. Fisher's exact test with CLC Genomics Workbench was used to determine which variations were associated with the PTLD group compared with the EBV hi and EBV lo groups. A P value threshold of 0.05 was set to identify variations that were significantly more common in the PTLD samples than in the EBV hi and EBV lo samples. The presence or absence of these variations in the PTLD, EBV hi , EBV lo , SLCL, and control groups was mapped using the pheatmap package in RStudio (RStudio).

Table 4. Summary of variations identified in PTLD samples that alter binding of predicted epitopes to HLA-A alleles
Analysis of EBV peptides and epitopes. For each PTLD-associated variation, a 29-amino acid sequence from the EBV hi and EBV lo reference sequence, with the location of the variable amino acid at location 15, was used to predict peptides that bind to HLA-A, -B, and -DR alleles on NetMHC 4.0 Server and NetMH-CII (DTU Bioinformatics). Percentile rank is used for assignment to binder status and is determined from neural network analysis of binding affinity and eluted ligand data. For the PTLD-associated variations in the BARF1 gene, the amino acid sequence used in the prediction analysis was truncated because the variable amino acids were less than 14 amino acids from the end of the protein. For each predicted epitope that contained the variable amino acid, the PTLD-associated variation replaced the reference amino acid to determine the impact on the binding affinity concentration of the epitope. NetMHC 4.0 Server was used to determine whether PTLD-associated variations altered the binding of previously defined and predicted T cell epitopes. PyMOL was used to model the binding of the EBNA3C epitope, QPRAPIRPI, to HLA-B7 and to predict the number of hydrogen bonds between the peptide and the HLA binding pocket.
Statistics. Statistical significance of EBV viral load between the PTLD, EBV hi , and EBV lo groups was determined by 2-way ANOVA, Tukey's multiple comparisons test, and a significant P value of less than or equal to 0.05. Statistical significance between the number of variations in 1000-nt segments across the EBV genome from PTLD, EBV hi , and EBV lo groups was determined using 2-way ANOVA with Tukey's test and a significant P value of less than or equal to 0.05. To determine if there was a significant difference in the number of synonymous and nonsynonymous variations between sample groups, a 2-way ANOVA with Tukey's test and a significant P value of less than or equal to 0.05 was used. Phylogenetic trees were constructed using neighbor-joining model and Jukes-Cantor genetic distance modeling. Fisher's exact test with CLC Genomics Workbench was used to determine which variations were associated with the PTLD group compared with the EBV hi and EBV lo groups and a significant P value of less than or equal to 0.05. Statistical significance between the number of PTLD-associated variations in the PTLD, EBV hi , and EBV lo groups was determined using unpaired 2-tailed t test with a significant P value of less than or equal to 0.05. Statistical significance between number of reference peptide and PTLD peptide binding affinities was determined using paired 2-tailed t test with a significant P value of less than or equal to 0.05. Study approval. This study was approved by the IRB at Stanford University. All participants provided written informed consent prior to inclusion in the study.

Author contributions
EMM contributed to design of research studies, conducted experiments, acquired data, analyzed data, and wrote the manuscript; STH and VAB conducted experiments; JT performed data analysis and edited the manuscript; MFV performed the HLA typing; COE provided patient samples and analysis of clinical information; SMK contributed to design of the study, analyzed data, and edited the manuscript; and OMM conceived of and designed the research studies, analyzed data, supervised the study, and wrote the manuscript. All authors reviewed and approved the manuscript.