An in vivo model of glioblastoma radiation resistance identifies long noncoding RNAs and targetable kinases

Key molecular regulators of acquired radiation resistance in recurrent glioblastoma (GBM) are largely unknown, with a dearth of accurate preclinical models. To address this, we generated 8 GBM patient-derived xenograft (PDX) models of acquired radiation therapy–selected (RTS) resistance compared with same-patient, treatment-naive (radiation-sensitive, unselected; RTU) PDXs. These likely unique models mimic the longitudinal evolution of patient recurrent tumors following serial radiation therapy. Indeed, while whole-exome sequencing showed retention of major genomic alterations in the RTS lines, we did detect a chromosome 12q14 amplification that was associated with clinical GBM recurrence in 2 RTS models. A potentially novel bioinformatics pipeline was applied to analyze phenotypic, transcriptomic, and kinomic alterations, which identified long noncoding RNAs (lncRNAs) and targetable, PDX-specific kinases. We observed differential transcriptional enrichment of DNA damage repair pathways in our RTS models, which correlated with several lncRNAs. Global kinomic profiling separated RTU and RTS models, but pairwise analyses indicated that there are multiple molecular routes to acquired radiation resistance. RTS model–specific kinases were identified and targeted with clinically relevant small molecule inhibitors. This cohort of in vivo RTS patient-derived models will enable future preclinical therapeutic testing to help overcome the treatment resistance seen in patients with GBM.

The DESeq2 R package was used for normalization, clustering, and DEG.A Benjamini-Hochberg (BH)-adjusted p-value of 0.05 was used as the significance threshold.PCAs were performed using the R base stats and factoextra (v1.0.7) packages.DGCA was performed using the DGCA R package (v1.0.2) with pre-filtering of zero counts and a filter based on empirical Bayes statistics for DEG between RTU and RTS replicates for each sample.The normalized expression table was filtered using the combined list of DEG and DGCA results for each sample (filtered normalized counts table).This table was used as an input to FastEMC (v0.0.6) which selected and sorted the genes which best discriminated between the sensitive and resistant phenotype (3).The top 100 predictive features (genes) were retained for downstream analysis.The FastEMC algorithm has since been updated utilizing the scikit-learn ML package in python (https://rowland-208.github.io/ivygap/).

Gene-module selection and visualization:
The unfiltered normalized expression table was in parallel used as the input for the WGCNA R package (v1.69) which selected gene modules that distinguished between and within samples.The modules within samples were analyzed with BEERE to generate interaction networks which were then input in Gene Terrain to visualize the differential utilization of genes within the modules between RTU and RTS PDX (9,10).

Nucleic acid binding prediction:
A table of cDNA was converted into FASTA format for lncRNAs and coding genes identified by DEG, DGCA, and FastEMC using the biomaRt package.FASTAs for lncRNAs and coding genes were input into ASSA (v1.00) to identify RNA:RNA predicted interactions.ASSA results were filtered to remove duplicates and using an adjusted p-value cutoff of 0.0001 or less.FASTAs for the lncRNA and the Ensembl Regulatory Build (11) were also input into Triplexator in order to identify lncRNA:DNA triple helix sites.Triplexator flags -m R,Y,M,P,A; -v -of 0; and -rm 2 were used.Results were filtered manually in Excel to remove any results which contained errors and duplicates were removed.Chromosomal coordinates were then extracted into a new BED file (https://m.ensembl.org/info/website/upload/bed.html).This file was used to visualize lncRNA:RNA and lncRNA:DNA binding results using Circos.

Identification of purported lncRNA cis-regulatory elements:
The BED file was used as input to BedTools (v2.29.2) to search 20kb upstream and downstream (40kb window) from each triple helix site to identify genes proximal to these potential binding sites.The lists of lncRNA binding site proximal genes were compared against the global and pairwise DEG lists in R to find the union of significantly DEGs proximal to lncRNA binding sites.Genes that were both proximal to the lncRNA binding sites and identified as DEGs were considered as purported lncRNA cis-regulatory targets.The R corrr (v0.4.2) package was used to find correlations between the expression of lncRNAs and the proximal genes.Correlations were also calculated for pairwise and global differentially regulated genes that intersected curated gene sets for DDR, stemness, chromatin remodeling, cell cycle progression, among others.Genes of interest from DEG, DGCA, FastEMC, ASSA, and Triplexator/BedTools were then input into PAGER (12) for enrichment analysis.Key terms from PAGER were then combined with the lists of differentially regulated genes as input for BEERE which constructed gene:semantic interaction networks.The intersection of DEGs from each PDX pair with lncRNAs and their proximal genes were used as input to WIPER (13) which identified key gene pair relationships and generated lncRNA:mRNA transcript interaction networks.

Potential lncRNA regulation of DDR gene sets:
The unfiltered normalized expression table was input to GSEA (8) with custom gene sets (.gmt files) including the curated gene signatures of interest.The fmsb (v0.7.3)R package radarchart() function was used to generate radar plots for each pairwise comparison and global RTU vs RTS plotting the normalized enrichment scores (NES) from GSEA.Significantly DEGs that were proximal to lncRNA-binding sites of differentially regulated lncRNAs in each comparison were compared to the DDR gene-set lists.The genes which overlapped were added to the radar plots with the log-fold change and the sign of the correlation (corrr package) of that gene's expression with the expression of the lncRNA which binds proximal to that gene's locus.

Integration of lncRNA-regulated DEGs with differential kinase activity:
Upstream kinomics data was combined with lncRNA proximal genes of interest using MetaCore/GeneGo (Clarivate Analytics, Philadelphia, PA) for integrated pathway analysis.These integrated pathways are constructed for each PDX pair including the significant kinases differentially regulated between RTU and RTS along with related pathway enrichment based on lncRNA binding proximal genes selected by above mentioned methods.
Additional Differential Expression Approaches: Several approaches were employed to complement the standard DE analysis by identifying combinations of features which are highly predictive, in the case of machine learning (ML) or which are differentially regulated between phenotypes in the case of differential gene correlation analysis (DGCA).Transcriptomic features which discriminated RTU from RTS samples were selected globally and pairwise using Fast Exponential Monte Carlo or FastEMC (3).This unbiased, ML approach ranks combinations of transcriptomic features in order of their predictive power in distinguishing between RTU and RTS PDX.The top 250 predictive features from FastEMC, including 29 lncRNAs and 10 pseudogenes, exhibited overlap with global DEGs, pairwise DEGs, and gene signatures for DDR, stemness, chromatin remodeling, and cell cycle progression.Several of the lncRNAs identified by FastEMC have predicted DNA binding sites proximal to significantly DEGs (Figure 3 and Supplementary Figure S3).DGCA (5) was employed to identify the top differentially correlated gene pairs between RTU and RTS PDX.The top 1000 differentially correlated gene pairs along with significantly DEGs and the top 250 predictive genes from FastEMC were combined for further downstream analysis.
Correlations of lncRNA and significantly altered DEGs: After confirming the non-coding potential of the lncRNA targets using two inspection strategies (Coding Potential Calculator 2.0 found at http://cpc2.gao-lab.org/and AnnoLnc 2.0 found at http://annolnc.gao-lab.org/),we performed correlation studies with the significantly altered DEG's.The X1516 pair, DUXAP9 and DUXAP10 were positively correlated with chromatin remodeling gene RBBP7 while negatively correlated with HDAC9 and RERE (Supplementary Figure S3).In the X1153 pair, ZFAS1, SAMMSON, and SOX2-OT showed a mixture of positive and negative correlations with stemness-related COVL, PTPRZ1, and MAPRE2 genes (Supplementary Figure S3).In the JX12T pair, ZFAS1 also showed a positive regulation with MAPRE2 while simultaneously having a negative correlation with cell-cycle gene RFC3 (Supplementary Figure S3).Across multiple lncRNAs, the forkhead box P1 (FOXP1) transcription factor is targeted both directly and indirectly.Correlation with proximal targets of FOXP1 reveals overlap with predictive features from FastEMC as well as with stemness, chromatin remodeling, and homologous recombination signatures.Additionally, the lncRNA AF106564.1 showed a strong positive correlation with the expression of AC058822.1 (ENSG00000282278), the novel FIP1L1-PDGFRA fusion transcript as well as with NTRK3 (ENSG00000140538) which is a putative growth factor for CNS tumors and is also a component of multiple oncogenic fusion products (14).
Network Analysis Reveals Differential Pathway Enrichment in RTS PDX: Pathway, gene set, gene module, and gene network level analyses were completed using GSEA (8), WebGestalt (15), Pathway, Annotated-list, and Gene-signature Electronic Repository (PAGER) (12), Weighted Gene Correlation Network Analysis (WGCNA) (16), Gene Terrain (10), Biomedical Entity Expansion, Ranking, and Exploration (BEERE) (9), and Weighted In-Path Edge Ranking for biomolecular association networks (WIPER) (13).The WGCNA ML approach identified 7 clusters of gene modules which distinctly identify the PDX samples by patient tumor of origin (Figure S4A).Gene interaction networks were formed for each of these gene module clusters using BEERE.Differential regulation of these networks is observed between RTU and RTS PDX within each pair using Gene Terrain to overlay average gene expression on the gene network coordinates (Figure S4B).There are 160 significantly DEGs within the JX12T WGCNA gene modules with the majority of genes, including cell cycle related genes, upregulated in the RTU line (Figure S4A, B).Over-representation analysis (ORA) identifies chromosome localization and cell cycle processes as being enriched in the JX12T-related gene modules (Figure S4C).Actively dividing cells, such as those enriched in JX12T RTU tumors, would be expected to be more susceptible to radiation-induced DNA damage.The top GSEA result for the JX12T-related gene modules were a signature of genes under-expressed in stem cell-like cholangiocarcinoma, enriched (over-expressed) in the JX12T RTU line (Figure S4D).This represents an enrichment of genes in stem-like cells, which are thought to confer therapy resistance, in the JX12T RTS PDX.

WIPER Networks:
In order to identify the potential regulatory impact of lncRNAs on gene networks at the epigenetic level, we compared RTU/RTS DEGs to the genes proximal (within 40kb window) to predicted lncRNA DNA binding sites Supplementary Figure S2B.WIPER analysis (Supplementary File S2) of combined global DEGs intersecting with lncRNA DNA binding site proximal gene targets revealed a SMAD3 centric network, indicating potential enrichment of TGFR-beta signaling.Gene networks were constructed for each PDX pair using pairwise differentially regulated lncRNAs and significantly differentially expressed genes.The lncRNA-DNA binding potential and proximal gene list were constructed for each PDX pair.This list was then compared to the list of pairwise significantly differentially expressed genes to find the union of the two sets.In the JX12T, JX14P, X1465, and X1516 pairs, WIPER identified several collagen-related genes as central to these networks.This aligns with PAGER, GSEA, and ORA results highlighting changes in focal adhesion and extracellular matrix remodeling.In JX14T, networks were centered on lncRNAs ZFAS1, DUXAP10, AUXG01000058.1,and SOX2-OT as well as centers around EGFR, PIK3R1, and SRC.JX39P networks were centered on SAMMSON, DUXAP10, ZFAS1, AUXG01000058.1,ERBB2, STAT1, and proto-oncogene MYC.X1465 and X1516 pairs had proto-oncogene FOS centric networks.EGFR, EGF, and ACTB were also central nodes in the X1465 network.
Mouse transcriptome analysis: Raw paired-end RNA-seq reads were mapped to a human (hg38) and mouse (mm10) combined reference genome using tophat2 which uses bowtie2 (version 2.3.3) as its short read mapper.HTSeq was used to quantify read counts for each gene locus for both the human and mouse genes.Separately, the human and mouse gene sets were imported into R. Library size normalization and differential gene expression analysis were performed DESeq package per user manual.Cell line was fit in the model to reduce variation due to individual level differences in gene expression and each gene was tested for changes as a function of RT selection, RTS versus RTU.Log-fold change for each gene was extracted from R in a rank list format and served as input for Gene Set Enrichment Analysis (GSEA) software (version 4.1.0).The pre-ranked gene set workflow of GSEA was performed for hallmark curated gene sets and positional gene sets.A false-discovery rate of 10% was used as a threshold for identifying altered gene sets.See Supplementary Figure S6.