Transcriptomic silencing as a potential mechanism of treatment resistance

Next-generation sequencing (NGS) has not revealed all the mechanisms underlying resistance to genomically matched drugs. Here, we performed in 1417 tumors whole-exome tumor (somatic)/normal (germline) NGS and whole-transcriptome sequencing, the latter focusing on a clinically oriented 50-gene panel in order to examine transcriptomic silencing of putative driver alterations. In this large-scale study, approximately 13% of the somatic single nucleotide variants (SNVs) were unexpectedly not expressed as RNA; 23% of patients had ≥1 nonexpressed SNV. SNV-bearing genes consistently transcribed were TP53, PIK3CA, and KRAS; those with lower transcription rates were ALK, CSF1R, ERBB4, FLT3, GNAS, HNF1A, KDR, PDGFRA, RET, and SMO. We also determined the frequency of tumor mutations being germline, rather than somatic, in these and an additional 462 tumors with tumor/normal exomes; 33.8% of germline SNVs within the gene panel were rare (not found after filtering through variant information domains) and at risk of being falsely reported as somatic. Both the frequency of silenced variant transcription and the risk of falsely identifying germline mutations as somatic/tumor related are important phenomena. Therefore, transcriptomics is a critical adjunct to genomics when interrogating patient tumors for actionable alterations, because, without expression of the target aberrations, there will likely be therapeutic resistance.


Introduction
Clinical-grade next-generation sequencing (NGS) is being increasingly used as a tool to direct the treatment of cancers. By revealing alterations in some of the most commonly affected and instrumental genes in cancer development and progression, NGS ostensibly identifies potentially actionable targets (1). The benefit of NGS has been recognized by the Centers for Medicare & Medicaid Services, which led to their expansion of NGS coverage for patients with advanced cancers (2). The information obtained from interrogating a single patient's cancer with NGS can potentially help guide his or her care, while the data derived from population-based NGS studies can potentially shed light on previously unknown drivers of response and resistance across cancer subtypes (3).
Unfortunately, the outcome for many patients with metastatic tumors remains grim even in the era of precision oncology, with responses often being unpredictable, even when clinical-grade NGS is used. As tumors are exposed to successive lines of therapy (e.g., chemotherapy, radiation therapy, gene-or immune-targeted agents, etc.), tumor heterogeneity evolves and new drivers may emerge (4). Additional resistance mechanisms may be due to one or more of the following: the tumor microenvironment, which secretes various hormones, growth factors, and cytokines that can propel cell growth and survival (5); cancer progenitor cells that increase the expression of drug efflux transporter genes that pump drugs out of the cell and into the interstitium (6); mutations within drug transporter genes that attenuate drug binding and influx into the cell and/or absorption (7); increased expression of antiapoptotic proteins and decreased expression of proapoptotic proteins (7); amplification of oncogenes and protumorigenic genes (8); mutations within drug target domains, such as the kinase motifs (9); and changes in genes, such as BRCA, that revert them to "normal," consequently abrogating sensitivity to PARP inhibitors (10). Altered expression, including gene silencing, via mechanisms, such as epigenetic changes (DNA methylation and histone Next-generation sequencing (NGS) has not revealed all the mechanisms underlying resistance to genomically matched drugs. Here, we performed in 1417 tumors whole-exome tumor (somatic)/normal (germline) NGS and whole-transcriptome sequencing, the latter focusing on a clinically oriented 50-gene panel in order to examine transcriptomic silencing of putative driver alterations. In this large-scale study, approximately 13% of the somatic single nucleotide variants (SNVs) were unexpectedly not expressed as RNA; 23% of patients had ≥1 nonexpressed SNV. SNV-bearing genes consistently transcribed were TP53, PIK3CA, and KRAS; those with lower transcription rates were ALK, CSF1R, ERBB4, FLT3, GNAS, HNF1A, KDR, PDGFRA, RET, and SMO. We also determined the frequency of tumor mutations being germline, rather than somatic, in these and an additional 462 tumors with tumor/normal exomes; 33.8% of germline SNVs within the gene panel were rare (not found after filtering through variant information domains) and at risk of being falsely reported as somatic. Both the frequency of silenced variant transcription and the risk of falsely identifying germline mutations as somatic/tumor related are important phenomena. Therefore, transcriptomics is a critical adjunct to genomics when interrogating patient tumors for actionable alterations, because, without expression of the target aberrations, there will likely be therapeutic resistance. insight.jci.org https://doi.org/10.1172/jci.insight.134824 acetylation) (11), miRNA (12) and/or alternative splicing of RNA and/or fluctuations in RNA-binding proteins (13,14), may also play a role in resistance to therapy. However, there is a dearth of knowledge about the frequency of gene silencing in cancer tissue and its potential implications for drug resistance.
Transcriptomic analysis can be used to reveal the silencing of cancer-related variant transcription. Others have advanced the use of combined RNA and DNA analysis, including the development of tools such as RADIA (15) and UNCeqR (16), with the primary goal of enhancing detection of somatic mutations, especially at low DNA allelic frequencies. These studies and those focused on RNA analysis (17,18) have revealed the importance of looking at expression for understanding tumor biology and cancer progression. The work presented here builds on the aforementioned studies; however, it was performed using a very large data set with the goal of determining the frequency of silencing of single nucleotide variant-bearing (SNV-bearing) genes (that are potentially drivers) by integrated RNA and DNA analysis.
Herein, we examined a cohort of 1879 patients with whole-exome sequencing of tumor and nontumor tissue (tumor/normal sequencing) to determine the somatic versus germline origin of identified mutations, in concert with cognate transcriptomic sequencing in a subset of 1417 of these patients, in order to evaluate the frequency of tumor-related gene silencing in the pan-cancer setting.

Results
Patient demographics. This cohort comprised 1879 patients with multiple different tumor types (Supplemental Table  1; supplemental material available online with this article; https://doi.org/10.1172/jci.insight.134824DS1). All of these patients were used for tumor/normal NGS, that is, known tumor tissue NGS data were compared with those from normal nontumor tissue, which here was either same-patient blood or buccal (swab) tissue. The most common tumor samples (n > 100) were from breast, colon, lung, soft tissue (including sarcoma), pancreatic, and ovarian cancer. The ages of all patients included in the study ranged from 0 years old for soft tissue sarcoma to 92 years old for bladder cancer. The lowest median age per tumor type was for soft tissue tumors, at 18 years old, and the highest median age was 72 years old for bladder cancer.
SNVs in germline mutations compared with somatic mutations. Somatic (tumor) and germline (normal tissue) DNA sequence information, specifically SNVs, were compared for all gene regions captured by whole-exome sequencing (WES) in 1879 patients. As shown in Figure 1, 92% of sites that do not comport with the hg19 reference human genome within tumor samples were eliminated as somatic events by comparison to the patient-matched germline genome; thus only approximately 8% of the SNVs identified in tumor genomes were tumor specific (not found in normal tissue). This highlights the importance of the tumor/normal sequence comparison we performed here (see Methods, NGS-WES and somatic-specific variant calling) (although many laboratories have bioinformatics and/or data analytic tools in place to mitigate this issue). In addition, 33.8% of germline single nucleotide polymorphisms (SNPs) within the gene panel were rare (not found after filtering through variant information domains, such as dbSNP; https://www. ncbi.nlm.nih.gov/snp/) and at risk of being falsely reported as somatic; at least 1 variant potentially falsely identified as somatic (i.e., rare or unannotated nonsomatic SNP) was found in 52% of patients. The genes with the highest number of germline rare SNPs were ALK, HNF1A, ATM, FLT3, TP53, ERBB2, and APC. The genes with the most true somatic mutations were TP53, PIK3CA, KRAS, and APC.
Discordance between DNA alterations and RNA expression of SNVs. To determine rates of variant nonexpression in driver genes, 1417 patients with paired DNA-and RNA-Seq information were analyzed. The genes examined were based on the 50-gene Ampliseq cancer hotspot v2 panel (19) (see Methods).
Among 1417 patients, 2190 nonsynonymous somatic hotspot gene variants were found in DNA and 334 were not expressed in RNA (15.4%) as presented in Tables 1 and 2 and Figure 2. Of these 334 nonexpressed variants, 300 were from patients with common cancer types, which allowed tissue-specific gene expression levels to be ascertained. A high percentage -85.7% (257 of 300) -of these variants were in genes that are typically expressed; thus, the rate of unexpected nonexpression of variants was estimated at 13.1%. At least 1 nonexpressed variant was found for 23% of patients. The genes with the highest discordance between DNA alteration and presence in RNA transcripts were ALK, CSF1R, ERBB4, FLT3, GNAS, HNF1A, KDR, PDGFRA, RET, and SMO. The genes in which alterations were virtually always expressed at the RNA levels were TP53, PIK3CA, and KRAS ( Figure 2).
The percentage of SNVs expressed as transcripts ranged from 58% in melanoma to 100%, the latter mainly in underrepresented tumor types, which may have prevented accurate frequency analysis (Table 1). Taken together, approximately 85% of SNVs were expressed at the RNA level. Excluding tumor types with less than 5 SNVs (due to small numbers of patients), the cancers with the highest percentage of SNVs expressed at the RNA level were cervical, rectal, brain, ovarian, breast, and gastric (stomach) (ranging from 90%-96% expressed). The cancers with the lowest level of SNVs expressed at the RNA level were melanoma, nonmelanoma skin, and gallbladder cancer, ranging from 58%-69% expressed.
Nonsynonymous mutational load and relationship to silenced variants. Nonsynonymous mutation load ( Table  1) was calculated based on WES, as described in the Methods. Nonmelanoma skin cancer and melanoma had the highest average nonsynonymous (amino acid-altering) mutational load. Mesothelioma and bile duct (extrahepatic tumors) had the lowest average nonsynonymous mutational load. There was a weak correlation (but statistically significant) relationship between mutational load and percentage of nonexpressed variants (Pearson R = 0.17, P = 2.5 × 10 -7 ; Supplemental Figure 1).

Discussion
In this report, we demonstrate discordance between genomic sequencing and transcriptome analysis for putative driver alterations (20), with the specific goal of revealing the frequency of variant nonexpression in this cohort. This discordance, when it results in the lack of expression of a therapeutic protein target, may reflect a mechanism of resistance to therapy that has previously been underrecognized and should be subject to further investigation. It should be noted here that the samples analyzed were largely from primary tumors and were not compared with metastases, if available. This raises the possibility that genes that are silenced in the primary tumor may be expressed in a metastasis, making such metastases target for drugs not expected to be efficacious for the primary. In addition, while sequencing tumor genomes identifies somatic SNVs -some of which are targetable with FDA-approved drugs -our analysis here revealed a very high rate of germline mutations that could erroneously be reported as somatic in the absence of advanced bioinformatics algorithms. Interestingly, most HNF1A mutations were germline (and, importantly, all were silenced); both phenomenon would have significant implications for targeting this gene alteration.
Importantly, 23% of 1417 tumors had at least 1 somatic variant that was not expressed. Taking normal expression patterns into account, at least 13% of variants detected in DNA were unexpectedly not Overall, 92% of single nucleotide variants (SNVs) identified from sequencing tumor genomes alone were of germline origin. (Ampliseq is a hotspot gene panel of the 50 most frequently altered genes in all cancers.) Interestingly, many of the genes with SNVs with the highest somatic specificity -TP53, KRAS, and PIK3CA -were also most likely to be expressed (see Figure 2). SNPs, single nucleotide polymorphisms.
expressed. We found a weak but statistically significant relationship between mutational load and the frequency of nonexpressed variants (Table 1 and Supplemental Figure 1). The relationship between mutational load and gene silencing, while weak, is an interesting finding that may merit further study. For example, nonmelanoma skin cancer had an exomic mutational load of 2660 and 33% silenced variants, while pancreatic cancer had a mutational load of 70.54 and 11% silenced variants. The genes with the highest discordance between DNA alteration and transcription were ALK, CSF1R, ERBB4, FLT3, GNAS, HNF1A, KDR, PPDGFRA, RET, and SMO, while the genes in which alterations were virtually always expressed at the level of transcription were TP53, PIK3CA, and KRAS. In future studies, tumor type and other biological features could be considered along with mutational burden and expression.
While we did not determine the exact mechanisms involved in what appears to be transcriptional silencing, alteration of gene expression is characteristic of cancer cells; recently, there has been focus on the role of posttranscriptional editing and its effect in disease paradigms, particularly cancer, and the important context of the activity of silenced variants (21)(22)(23)(24). Described mechanisms of RNA silencing on a global scale may involve miRNA, RNAi, and siRNA (25); may result in degradation of transcripts or silencing of transcription; and may be implicated in the gene SNV/transcript discordance seen here. For example, it is thought that mRNA sequences are cleaved by homologous siRNAs, which leads to RNAi and consequently silences expression (25). miRNAs can rapidly degrade transcripts (as well as inhibit translation), and the targets of miRNA are generally the genes and proteins previously implicated in drug resistance but even include other miRNAs involved in oncogenesis (12,26).
Another key finding of our study was that 92% of potential SNVs from genomic sequencing were of germline origin ( Figure 1) and that 33.8% of germline SNPs within the gene panel were rare and at risk of being falsely reported as somatic (even after filtering through databases for variant information, such as dbSNP (https://www.ncbi.nlm.nih.gov/snp/); at least 1 such variant potentially falsely identified as somatic (i.e., nonsomatic variant) was found in 52% of patients (Supplemental Figure 2). However, it

Figure 2. Expressed versus nonexpressed somatic single nucleotide variants among genes in the Ampliseq panel.
The proportion of somatic-specific single nucleotide variants (SNVs) for which transcripts were identified by RNA-Seq was determined. Tissue-specific gene expression patterns were established to ensure nonexpression of variants was not due to normal differential expression of genes (see Methods) and hence to differentiate expected versus unexpected nonexpression. At least 23% of patients had at least 1 somatic SNV that was not expressed. TP53, KRAS, and PIK3CA SNVs were almost always expressed at the RNA level, whereas ALK, CSF1R, ERBB4, FLT3, GNAS, HNF1A, KDR, PDGFRA, RET, and SMO and several other SNVs were often not expressed (see also Table 2). Since Ampliseq is a hotspot gene panel of the 50 most frequently altered genes in all cancers, nonexpressed SNVs may be clinically relevant.
should be kept in mind that many laboratories have sophisticated bioinformatics tools in place that can diminish the risk of reporting these germline variants as somatic alterations. Some of the genes most likely to have true somatic mutations were TP53, PIK3CA, and KRAS, all of which have well-defined roles in the pathogenesis and evolution of tumors across a multitude of tumor types.
This study has several important limitations. First, the data were derived retrospectively. Second, the sequencing information was curated from a deidentified database, and, hence, clinical correlations with outcome after cognate therapies, taking into consideration RNA expression, were not available. Such a study would be an important next step. Nonsynonymous load refers to the raw counts of total nonsynonymous mutations across the entire exome; the "average" is the mean across the tumor type noted. SNV, single nucleotide variant.
Tissue-specific gene expression patterns were established to ensure nonexpression of variants was not due to normal differential expression of genes (see Methods) and hence to differentiate expected versus unexpected nonexpression. See also Figure 2. SNV, single nucleotide variant.
There have been several meta-analyses assessing the association and effect of biomarker-driven clinical trials and patient outcomes in terms of response rates as well as progression-free and overall survival. In meta-analyses of over 80,000 patients in phase I, II, and III trials, the subsets of patients treated with personalized biomarker-driven therapies had significant improvements in all outcome parameters (27)(28)(29), with genomic biomarkers producing response rates over 40% versus approximately 5% for no biomarkers. However, these numbers still leave many patients without response or with fleeting responses, even when matched by genomic biomarkers. A myriad of resistance mechanisms may account for unresponsiveness. They include, but are not limited to, driver coalterations in other genes, new mutations that emerge, and inadequate drug engagement with the target. However, our current observations indicate that a significant fraction of somatic variants are not expressed at the RNA level and that almost one-quarter of cancers harbor at least one unexpressed somatic variant ( Figure 2 and Table 2). Since the mutations explored were those in the Ampliseq cancer hotspot v2 panel of 50 well-characterized driver genes, these nonexpressed variants are likely to be clinically relevant. Perhaps not coincidentally, strong drivers such as TP53, PIK3CA, and KRAS were most likely to have true somatic mutations and were almost always expressed. In future studies, we will leverage paired DNA-Seq/RNA-Seq analyses to further explore other relevant and translatable findings, including neoantigen expression, novel fusion transcript detection, indel analysis, and alteration in copy number.
Almost one-quarter of 1417 tumors had a potentially deleterious gene mutation that was not expressed at the RNA level. The effect of gene silencing on response versus resistance after targeted therapies matched to oncogenic DNA alterations requires large-scale clinical correlative studies. Gene silencing may also be relevant to immunotherapy, as silenced alterations would not be expected to produce presented neoantigens; this too merits additional study. Taken together, our observations suggest that proper pharmacologic prosecution of a malignancy likely requires a full molecule portrait that includes transcriptomic interrogation of expressed versus silenced genomic alterations.

Methods
Sample preparation and data collection. Germline (normal tissue) and tumor NGS were performed in tandem using FFPE slides or tissue blocks and matched blood or buccal swab samples from 1879 unselected patients with advanced cancer (Clinical Laboratory Improvement Amendments, NantHealth; https://nanthealth.com). Up to ten 5-μm tissue sections were cut from blocks using standard microtomy techniques. The initial tissue section was subjected to H&E staining for histological analysis. An internal board-certified pathologist marked the regions of the malignancy on image of H&E-stained slides and confirmed the histological diagnosis. Using the marked-up image as a guide, tissue sections were macrodissected to collect the desired tumor regions. Each collected cell population was then extracted using the QIAGEN QIAamp DNA FFPE Tissue Kit for DNA. Extracted material was then quantified using a Qubit fluorometer.
NGS-WES and somatic-specific variant calling. Somatic-specific SNVs were identified as previously described (30) by WES of tumor and normal DNA. In brief, NGS libraries were captured to exome regions using xGen Exome Research Panel v1.0 (IDT), and libraries were prepared using the KAPA Hyper prep kit (Kapa Biosystems). DNA libraries were sequenced to a target depth of ×200 for tumor sample and ×100 for normal samples on the Illumina HiSeq platform. In order to obtain position-specific read support, raw reads were aligned to the hg19 reference genome using the Burrows-Wheeler Aligner (http:// bio-bwa.sourceforge.net) and converted to BAM format using SAMtools software (http://www.htslib.org). To remove support bias from polymerase chain reaction effects, duplicate reads were removed using SAM-BLASTER (https://github.com/GregoryFaust/samblaster/commit/a5274027e4c887ef6ab14f3226c0e-36beb4f4f25). Any site with ≥8 nonduplicate reads, mapping quality ≥20 (according to SAMtools), and a nonreference allele was considered as a possible SNV. If the tumor and normal genotypes were identical, then the variant was classified as germline and removed from the pool of possible SNVs. Instances where the normal genotype was heterozygous and the tumor genotype was homozygous suggest regions of loss of heterozygosity, as opposed to new SNV mutations, and such variants were so classified.
The remaining potential SNVs were filtered to those with high confidence using several quality metrics (see North et al. for details,ref. 30), including coverage metrics as well as those derived from SomaticSniper v1.0.5.0 (http://gmt.genome.wustl.edu/packages/somatic-sniper/), a Bayesian model that reports the probability of the observed tumor genotype given the prior probability of the reference genotype and the fraction of heterozygous positions in the human genome. A single variant cell format file was produced for each tumor sample describing somatic-specific SNVs.