Circulating cell-free methylated DNA reveals tissue-specific, cellular damage from radiation treatment

Radiation therapy is an effective cancer treatment, although damage to healthy tissues is common. Here we analyzed cell-free, methylated DNA released from dying cells into the circulation to evaluate radiation-induced cellular damage in different tissues. To map the circulating DNA fragments to human and mouse tissues, we established sequencing-based, cell-type-specific reference DNA methylation atlases. We found that cell-type-specific DNA blocks were mostly hypomethylated and located within signature genes of cellular identity. Cell-free DNA fragments were captured from serum samples by hybridization to CpG-rich DNA panels and mapped to the DNA methylation atlases. In a mouse model, thoracic radiation-induced tissue damage was reflected by dose-dependent increases in lung endothelial and cardiomyocyte methylated DNA in serum. The analysis of serum samples from patients with breast cancer undergoing radiation treatment revealed distinct dose-dependent and tissue-specific epithelial and endothelial responses to radiation across multiple organs. Strikingly, patients treated for right-sided breast cancers also showed increased hepatocyte and liver endothelial DNA in the circulation, indicating the impact on liver tissues. Thus, changes in cell-free methylated DNA can uncover cell-type-specific effects of radiation and provide a readout of the biologically effective radiation dose received by healthy tissues.


Processing of human serum and plasma samples
Circulating cell-free DNA was extracted from 3 to 4 mL human serum or plasma or 0.5 mL mouse serum, using the QIAamp Circulating Nucleic Acid kit (Qiagen) according to the manufacturer's instructions. Cell-free DNA was quantified via Qubit fluorometer using the dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific). Additional size selection using Beckman Coulter beads was applied to remove high-molecular weight DNA reflective of cell-lysis and leukocyte contamination as previously described (1). The same bead-based size selection was applied to all samples that were acquired through standardized serum isolation and cfDNA extraction protocols. This method also served to concentrate the samples to the desired input volume before bisulfite conversion.
Fragment size distribution of isolated cfDNA was validated on the 2100 Bioanalyzer TapeStation (Agilent Technologies).
Control human serum and plasma from healthy adult donors was purchased from Innovative Research (SKU#ISERS10ML and SKU#IPLASK2E10ML) to compare results from our analyses across sample preparations. While plasma is produced when whole blood is collected in tubes that are treated with anticoagulant, serum is obtained after allowing blood to clot for 30 minutes at room temperature and then centrifuging the samples to remove the cellular component (2,3). Studies demonstrate that cellular components significantly increase in samples that sit longer than 60 minutes while clotting; however, adherence to standard operating procedures for preparation of serum and plasma have been found to greatly reduce contamination and sources of error (4).
We took extra steps to address these concerns by ensuring timely processing of blood samples and performing an additional bead purification after cfDNA isolation to remove high-molecular weight DNA, likely derived from contaminating blood cell lysis (Supplemental Figure 13A and B). We found that taking this approach, cfDNA methylation status at the block level is highly correlated when comparing cfDNA derived from serum or plasma (Supplemental Figure 13C; Pearson r = 0.95). In addition, deconvolution analysis verified that the %immune and %solid organ origins of cfDNA does not vary across the two sample types (Supplemental Figure 13D). In fact, there appears to be slightly less variation across donors in the predicted cell type proportions composing cfDNA extracted from serum compared to plasma. Thus, despite an overall higher Geq Immune found in serum due to the overall higher cfDNA concentrations, this background signal is consistent from sample-to-sample allowing for accurate comparison of changes over time in serial samples collected from the same individuals (Supplemental Figure   13D and E).

RNA isolation, RNA-sequencing, and RT-qPCR analysis
RNA was isolated from tissues or sorted cells using the RNeasy Kit (Qiagen) following homogenization using the MagNA Lyser (Roche) according to the manufacturer's protocol and quantified by Qubit RNA BR assay (Thermo Fisher Scientific). Total RNA was validated using an Agilent RNA 6000 nano assay on the 2100 Bioanalyzer TapeStation (Agilent Technologies). The resulting RNA Integrity number (RIN) of samples selected for downstream qPCR or RNAseq analysis was at least 7. Reverse transcription (RT) was done using the iScript cDNA Synthesis Kit (Bio-Rad) according to the manufacturer's protocol. Real-time quantitative RT-PCR was performed with iQ SYBR Green Supermix (Bio-Rad). Primers used for RT-qPCR were purchased from Integrated DNA Technologies, and their sequences are provided in Supplemental Table 9. Fold change was calculated as a percentage normalized to housekeeping gene actin (Actb) using the delta-Ct method. All RT-qPCR assays were done in triplicate. RNA-sequencing libraries were prepared using TruSeq Total RNA library Prep Kit (Illumina) at Novogene Corporation Inc., and 150bp paired-end sequencing was performed on an Illumina HiSeq 4000 with a depth of 50 million reads per sample. A reference index was generated using GTF annotation from GENCODEv28. Raw FASTQ files were aligned and assembled to GRCh38 and GRCh37 with HISAT2 / Stringtie (V 2.1.0) (5). The differential expression was analyzed in R with packages EdgeR (V 3.32.1) and Rsubread (V1.6.3) (6,7). Derived counts per million and p-values were used to create a rank ordered list, which was then used for subsequent integrative analysis. Expression levels at known cell type markers from single cell expression databases were used to validate the identity of isolated cell type populations for methylome analysis (8).

Reference DNA methylation data from healthy tissues and cells
Controlled access to reference WGBS data from normal human tissues and cell types was requested from public consortia participating in the International Human Epigenome Consortium (IHEC) (9) and upon approval downloaded from the European GenomePhenome Archive (EGA), Japanese Genotype-phenotype Archive (JGA), database of Genotypes and Phenotypes (dbGAP), and ENCODE portal data repositories (Supplemental Table 1) (10)(11)(12). Reference mouse WGBS data from normal tissues and cells were downloaded from selected GEO and SRA datasets (Supplemental Table 2) (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26). We required that all reference methylomes included for analysis be sequencing libraries made from bisulfite-converted DNA that we could process starting from the raw sequencing data. Downloaded FASTQs were processed and realigned in a similar manner as the locally generated bisulfite-sequencing libraries described above. However, parameters were adjusted to account for each respective WGBS library type. Methylation bias was assessed using the diagnostic tool as part of the Bismark package and read 5' and 3' trimming, mapping, and deduplication were performed based on the recommendations in the Bismark user guide for working with different library types and commercial kits (http://felixkrueger.github.io/Bismark/Docs/). WBGS libraries were deduplicated using deduplicate_bismark (V 0.22.3) All studies used purified DNA from cells or tissues as starting material, except for a subset of immune cell WGBS data generated by Blueprint epigenomics that performed bisulfite conversion directly on lysed cells (μWGBS protocol). Special consideration of bisulfite conversion efficiency was given to samples prepared by the µWGBS protocol and reads with a bisulfite conversion rate below 90% or with fewer than three cytosines outside a CpG context were removed (27).
For these cell-types with multiple data sources, we provide additional results in Supplemental Figures 2 and 3 to show that samples cluster by cell-type and not by library preparation method when performing unsupervised clustering analysis of the top ~30,000 variable blocks amongst all of the samples in the human and mouse datasets, respectively. Of note, for the reference methylation data that we generated ourselves in the lab, we validated the identity of the starting cell population through either RNAsequencing or FACS analysis. The human cardiopulmonary-and liver sinusoidalendothelial methylation sequencing data that we generated clustered closely with liver endothelial data generated by a separate study (25) and with HUVEC methylation data generated by Blueprint Epigenomics (EGAD00001002294). Likewise, the mouse immune cell-type methylation data that we generated clustered closely with other Bcell (17) and Tcell WGBS data (15) generated by other studies.

Identification of cell-type specific methylation blocks
We reduced the original 297 human WGBS samples to a final set of 104 samples to identify differentially methylated cell-type specific blocks. First, samples were split into training and testing groups for model validation (80% train and 20% test).
The unsupervised hierarchical clustering analysis in Figure 1 and Supplemental Figure 1 were performed using the training human and mouse WGBS data detailed in Supplemental Tables 1 and 2 through being assigned to a UMAP group (Column C). We further excluded samples from bulk tissues and those that did not have sufficient coverage (missing values in >50% of methylation blocks). Outlier replicates, or those clustering with fibroblasts or stromal cell types were excluded, due to possible contamination. Only immune cell methylomes that were reprocessed from raw sequencing data to PAT files were used to identify DMBs. We organized the final 104 human reference samples into groupings of 20 cell types (Supplemental Table 1). Similarly, the starting 109 mouse WGBS samples were reduced to a final set of 43 samples that were organized into a final grouping of 9 cell types and tissues (Supplemental Table 2). The final samples used to identify DMBs are designated by having an 'X' in Column E titled "Included in Atlas" in Supplemental Tables 1 and 2. Tissue and cell-type specific methylation blocks were identified from reference WGBS data using custom scripts (Supplemental code). We performed a one-vs-all comparison to identify differentially methylated blocks unique for each group. This was done separately for human and mouse. From this we first identified blocks covering a minimum of three CpG sites, with length less than 2Kb and at least 10 observations. Then, we calculated the average methylation per block/sample, as the ratio of methylated CpG observations across all sequenced reads from that block. Differential blocks were sorted by the margin of separation, termed "delta-beta", defined as the minimal difference between the average methylation in any sample from the target group vs all other samples. Then, we computed the "soft margin" between target samples and background samples, allowing for some outliers using percentiles (Supplemental Figure 15). For all markers we calculated the difference between the 80 th percentile of the methylation status in the target group (target.quant) and the 10 th percentile of the methylation status in the background group (bg.quant). We selected blocks with a soft margin 0.4 for human and 0.35 for mouse. This resulted in a variable number of cell-type specific blocks available for each tissue and cell type. Blocks with a (-) direction are hypomethylated and (+) direction are hypermethylated, defined as a as a direction of methylation in the target celltype relative to all other tissues and cell-types included in the atlas. We also used a magnitude threshold where all hypomethylated blocks have an Average Methylation Fraction (AMF) <0.5 and hypermethylated blocks have an AMF >0.5. However, the vast majority of cell-type specific differentially methylated blocks are much more diverged (mean AMF hypo = <10% methylation and mean AMF hyper = > 80%). Identified cell-type specific hypomethylated blocks in humans had a mean AMF of 0.081 ± 0.082 STDEV (median = 0.06; mode = 0.0). In mouse, hypomethylated blocks had a mean AMF of 0.079 ± 0.083 STDEV (median = 0.05; mode = 0.0). Likewise, identified cell-type specific hypermethylated blocks in humans had a mean AMF of 0.83 ± 0.10 STDEV (median = 0.85; mode = 0.9). In mouse, hypermethylated blocks have a mean AMF of 0.83 ± 0.083 STDEV (median = 0.85; mode = 0.89).
Additional separation of endothelial cell populations from different tissues was performed to identify unique markers for liver endothelial versus cardiopulmonary endothelial blocks that do not overlap. Separately, pan-endothelial blocks were identified with methylation status in common to all endothelial cell populations. Similarly, individual immune cell-type specific methylation blocks were identified for purified cell populations.
In addition, bulk immune blocks were identified with methylation status in common to all hematopoietic cell populations. The bulk immune methylation blocks were found to separate all hematopoietic cell types from solid organ cell types of different lineages and were used for deconvolution in the circulation. The solid organ compartment was then further parsed into individual cell-type contributors as specified in Column F titled "Atlas Groups" in Supplemental Tables 1 and 2. For some solid organ cell-types, a reduced subset of blocks (ie. top 200) were used for deconvolution in the circulation if the original number identified was greater than one standard deviation above the mean. The reduced subset of blocks were prioritized based on the margin of separation taking into account the quantile variation amongst all samples in target and background groups as described above. In addition, blocks were prioritized that consistently met our designated thresholds for cell-type specificity following iterative leave-one-out analysis of repeated segmentation and marker selection. Selected human and mouse blocks for cell types of interest that were used for deconvolution of cell-free DNA in the circulation can be found in Supplemental Tables 3 and 4. Extended cell-type specific blocks for purified populations of endothelial and immune cell-types can be found in Supplemental Tables 8 and 15.
For multiple cell-types in both human and mouse, we analyzed samples sourced from multiple consortiums using different library construction methods and still found the identified cell-type specific DMBs to be representative of all samples (Supplemental Tables 1 and 2). In addition, when forming groups to identify cell-type specific DMBs we required at least 3 or more samples be present for each group to ensure DMBs were reproducible across biological replicates.

Methylation score and visualization of cell-type specific methylation atlas
Each DNA fragment was characterized as U (mostly unmethylated), M (mostly methylated) or X (mixed) based on the fraction of methylated CpG sites as previously described (28). We used thresholds of ≤33% methylated CpGs for U reads and ≥66% methylated CpGs for M. We then calculated a methylation score for each identified celltype specific block based on the proportion of U/X/M reads among all reads. The U proportion was used to define hypomethylated blocks and the M proportion was used to define hypermethylated blocks. Heatmaps were generated using the pretty heatmap function in the RStudio Package for the R bioconductor (RStudioTeam, 2015).

In-silico simulations and WGBS deconvolution
In silico mix-in simulations were performed using wgbstools (V 0.1.0) (29) to validate the fragment-level deconvolution algorithm at the identified cell-type specific blocks (Supplemental Figures 9, 10, 11 and 12). Reference data with greater than three replicates per cell type were split into independent training and testing sets, leaving at least one replicate out for testing. Since the mouse lung endothelial reference WGBS data had only three replicates, sequenced fragments were merged across replicates for this cell type and then randomly split into training (80%) and testing (20%) sets (using wgbstools merge and then wgbstools pat_splitter). The cell-type specific blocks included in the human and mouse methylation atlases were constructed using training set fragments only. For each cell type profiled, we mixed known proportions of target fragments into a background of leukocyte fragments using wgbstools mix_pat. The leukocyte fragments were obtained from n=4 buffy coat samples in mouse and n=5 buffy coat samples in human. We performed three replicates for each admixture ratio assessed (0.05%, 0.1%, 0.5%, 1%, 2%, 5%, 10%, 15%), which were analyzed as described above, and present the average predicted proportion and standard deviation across all replicates. Model accuracy was assessed through correct classification of the actual percent target mixed.
Each mixture was analyzed using our WGBS atlas and fragment-level deconvolution model in contrast to the 450K array atlas and NNLS model described in (github.com/nloyfer/meth_atlas). Our sequencing-based approach allowed for fragment-level cfDNA analysis of CpG methylation patterns, as opposed to relying on the use of single CpG sites from methylation array data (31). From the in-silico mix-in simulations, we found that our probabilistic fragment-level deconvolution model outperforms traditional array-based analysis for each tissue and cell type of interest to validate the prediction accuracy and sensitivity (Supplemental Figure 12). However, there are inherent differences in the 450K array meth-atlas and NNLS model that prohibited a direct one-to-one comparison. First, the test WGBS data used to generate the in silico mixed samples has limited depth compared to the effective number of molecules measured by DNA methylation arrays. In addition, the meth-atlas is a complete deconvolution NNLS model and thus target fragments were admixed with leukocyte fragments across all features in the 450K array meth-atlas, which may have had variable coverage at informative features specific for each cell-type. Also, many of the features in the 450K array atlas were identified for tissues and not the purified cell-type WGBS data used in this study (ie. left atrium heart tissue as opposed to cardiomyocytes and lung tissue as opposed to lung epithelial cells). However, we found that pattern analysis at the cell-type specific methylation blocks identified here allowed for accurate detection of cfDNA from a given source when present in less than 0.1% of a mixture, a marked improvement in comparison to current 450K approaches (Supplemental Figure 12).

Functional annotation and pathway analysis
Cell-type specific methylation blocks were provided as input for analysis in HOMER (V4.11.1) (http://homer.ucsd.edu/homer/) (32). Each block was associated with its closest nearby gene and provided a genomic annotation using the annotatePeaks.pl function, with "-size given -CpG" parameters. By default, TSS (transcription start site) was defined from -1 kb to +100 bp, TTS (transcription termination site) was defined from -100 bp to +1 kb, and CpG islands were defined as a genomic segment with GC content ≥50%, genomic length >200 bp and the ratio of observed/expected CpG number >0.6. Prediction of known and de-novo transcription factor binding motifs were also assessed by HOMER using the findMotifsGenome.pl function. The top 5 motifs based on p-value were selected from each analysis. Pathway analysis of identified tissue and cell-type specific methylation blocks was performed using Ingenuity Pathway Analysis (IPA) (33) (Qiagen) and Genomic  Table 8.

Cross-species comparison and human and mouse cell-type specific DMBs
For cell-types represented in both human and mouse atlases we performed cross-species mapping using the UCSC Genome Browser LiftOver tool and the hg19ToMm9.over.chain.gz file. Only 24% of human cell-type specific blocks were mapped completely to the mouse genome, with the majority of human blocks having been deleted or partially deleted in the mm9 genome. The lower mapability rate of epigenetic regions (mostly within introns and intergenic regions) has been demonstrated by others (37, 38). This also reflects previous findings that only 40% of the nucleotides in the human genome can be aligned to mouse (39) and also the difference that there are only ~21 million CpG sites in the mouse mm9 genome relative to the ~28 million CpG sites in the human hg19 genome. Of the blocks that fully mapped, 100% were captured on both of the separately designed human and mouse hybridization capture panels. On average, 18% of these captured and mapped regions were identified as being cell-type specific methylation blocks in both species (Supplemental Table 14). Of note, 100% of overlapping cell-type specific methylation blocks had the same methylation status in both human and mouse (ie. human hepatocyte-specific hypomethylated blocks were hypomethylated specifically in mouse hepatocytes too). To circumvent limited crossspecies mapability, we also compared overlapping genes associated with identified celltype specific DMBs in cell-types represented in both human and mouse atlases (Supplemental Table 14). However, this is likely an underestimate due to limited overlap of regions captured from the separate human and mouse hybridization panels utilized.
We expanded on analysis of genomic distribution to compare the distribution of hypo-and hyper-methylated percentages between human and mouse species on a cell/organ basis.
We depict the distribution of cell-specific hypo-and hyper-methylated DMBs on a cell/organ basis in Supplemental Figure 5. Within all hypomethylated blocks, there wasn't a significant difference in genomic distribution for the same cell-type between human and mouse species, taking into account the distribution differences at baseline comparing captured regions (Chi-square, df=4, p>0.05). The same was true within hypermethylated blocks (Chi-square, df=4, p>0.05). However, there were differences in overall hypo-and hyper-methylated percentages between human and mouse samples across cell-types. Interestingly, all solid organ derived cell-type specific DMBs in mouse were hypomethylated, whereas all immune-cell specific DMBs were hypermethylated. In contrast, most human cell-types had both hypo-and hyper-methylated DMBs identified, with the large majority being hypomethylated. Despite this difference, the human bulk immune cell-type specific DMBs were 47% hypermethylated, suggesting that a higher percentage of hypermethylated markers is common to cell types comprising hematopoietic lineages in both human and mouse species. This was further supported by the 30% of immune cell-specific methylation blocks that overlapped between the human and mouse capture panels that were 100% correlated in methylation status and found to be hypermethylated in both species (Supplemental Table 14).

39.
Breschi A, Gingeras TR, Guigo R. Comparative transcriptomics in human and mouse. Nat Rev Genet. 2017;18 (7):425-440.  proportions at the blocks selected. The average predicted %target is shown relative to the known %mixed to assess sensitivity and specificity of the identified cell-type specific blocks and deconvolution model. Data presented as mean ± SD; N=3 replicates per proportion. Reference WGBS samples with less than 3 replicates were split into "0.8 train" to select methylation blocks and "0.2 test" to generate in-silico mixed samples. When available, in-silico mixed samples of the same cell type derived from differently aged mice were also tested (infant < 6 weeks old). In addition, bulk tissue containing the respective cell type was tested as well. Figure 10. into a background of lymphocyte or buffy coat read-pairs at various known percentages (0.1%, 0.5%, 1%, 2%, 5%, 10%, 15%). The deconvolution model was validated on these in-silico mixed samples of known cell-type proportions at the blocks selected. The average predicted %target is graphed relative to the known %mixed to assess sensitivity and specificity of the identified cell-type specific blocks and deconvolution model. Data presented as mean ± SD; N=3 replicates per proportion.  Figure 12. Performance of the probabilistic fragment-level deconvolution algorithm using WGBS data relative to NNLS MethAtlas from 450K array data. Cell type specific markers outperform the array-based atlas and achieve <0.1% resolution. Shown are in silico simulations for four cell types, computationally mixed within leukocytes at various known percentages (0.05%, 0.1%, 0.5%, 1%, 2%, 5%, 10%, 15%). Each mixture was analyzed using our WGBS atlas and fragment-level deconvolution model (red), compared to Moss et al. 2018 (gray).

Supplemental
Data presented as mean ± SD; N=3 replicates per proportion.