Single-cell transcriptomics of allo-reactive CD4+ T cells over time reveals divergent fates during gut GVHD

Acute gastrointestinal Graft-versus-Host-Disease (GVHD) is a primary determinant of mortality after allogeneic hematopoietic stem-cell transplantation (alloSCT). It is mediated by alloreactive donor CD4+ T cells that differentiate into pathogenic subsets expressing IFNγ, IL-17A or GM-CSF, and is regulated by subsets expressing IL-10 and/or Foxp3. Developmental relationships between T-helper states during priming in mesenteric lymph nodes (mLN) and effector function in the GI tract remain undefined at genome-scale. We used scRNA-seq and computational modelling to create an atlas of putative differentiation pathways during GVHD. Computational trajectory inference suggested emergence of pathogenic and regulatory states along a single developmental trajectory in mLN. Importantly, we identified an unexpected second trajectory, categorised by little proliferation or cytokine expression, reduced glycolysis, and high TCF1 expression. TCF1hi cells upregulated α4β7 prior to gut migration and failed to express cytokines therein. Nevertheless, they demonstrated recall potential and plasticity following secondary transplantation, including cytokine or Foxp3 expression, but reduced TCF1. Thus, scRNA-seq revealed divergence of allo-reactive CD4+ T cells into quiescent and effector states during gut GVHD, reflecting putative heterogenous priming in vivo. These findings, the first at a single-cell level during GVHD over time, can now be used to interrogate T cell differentiation in patients undergoing alloSCT.


Introduction 49
Allogeneic hematopoietic stem cell transplantation (alloSCT) is a curative therapy for a range of leukemias, 50 due to the capacity of donor T cells within the transplant to kill tumour cells -known as the graft-versus-51 leukaemia (GVL) effect. Unfortunately, donor T cells can also damage non-cancerous host tissue, particularly 52 the gastrointestinal (GI) tract, liver and skin, causing the serious condition, graft-versus-host-disease 53 (GVHD). Acute GVHD in the GI tract remains the primary determinant of GVHD severity and risk of death 54 (1). Thus, a primary goal for alloSCT is the prevention of acute gut GVHD while preserving GVL. We 55 previously showed in pre-clinical models that donor CD4 + T cells are initially activated by recipient non-56 hematopoietic antigen-presenting cells (APC) within the gut, including epithelial cells that upregulate MHCII 57 molecules (2, 3). Following this, donor-derived colonic dendritic cells (DC) also prime donor CD4 + T cells in 58 mesenteric lymph nodes (mLN) and trigger T helper-cell differentiation, which serves to amplify and 59 exacerbate GVHD (4). Using TCR transgenic T cells specific for a single allo-peptide, TEa cells (5), we 60 revealed that donor CD4 + T cells within the same alloSCT recipient differentiate into multiple cellular states 61 that express pro-inflammatory and pathogenic Th1/Th17-associated cytokines, including IFNγ and IL-17A, or 62 the master transcription factor for induced regulatory T (iTreg) cells, Foxp3 (4). While cytokines such as IL-6 63 and IL-12 control differentiation of donor CD4 + T cells, fate-mapping studies based on the IL-17a promoter 64 suggested complex dynamic relationships between apparent helper subsets (6). Thus, differentiation of allo-65 reactive donor CD4 + T cells is characterised by complexity, both in terms of dynamics and multiple cellular 66 states adopted, neither of which have been explored at genome-scale. 67 68 Single-cell mRNA sequencing (scRNA-seq) enables unbiased genome-wide assessment of individual T cells 69 without reliance upon pre-determined protein markers or genes. ScRNA-seq was previously employed to 70 examine heterogeneity in CD4 + T cells isolated from IL-17a reporter mice undergoing experimental 71 autoimmune encephalomyelitis (EAE) (7). Subsequently, scRNA-seq was used to examine CD4 + T cell 72 differentiation during house dust mite-induced allergy (8), protozoan parasite infection (9), as well as CD8 + T 73 cells in viral infections and cancer (10-12). Many of these studies were cross-sectional, offering insight into 74 heterogeneity amongst clonal TCR transgenic cells at a single timepoint. We previously examined CD4 + T 75 cell transcriptomes over a range of time-points during experimental malaria, and employed computational 76 approaches to re-construct the dynamics of Th1 versus Tfh differentiation. Using an approach based on 77 Bayesian Gaussian Process Latent Variable Modelling (bGPLVM), we identified a bifurcation point between 78 two trajectories, and revealed a role for T cell extrinsic factors in governing Th1/Tfh fate (9). More recently, 79 we employed scRNA-seq to reveal heterogeneity and tissue adaptation of thymic Tregs and colonic CD4 + T 80 cells during steady-state in mice and humans (13,14). Here, we examined donor antigen presenting cell 81 (APC)-mediated differentiation of allo-reactive donor CD4 + T cells in acute gut GVHD using droplet-based 82 scRNA-seq and computational modelling. 83 84

Results 85
Allo-reactive donor CD4 + T cells acquire heterogeneous pro-inflammatory, regulatory and 86 uncharacterised states during acute GVHD. 87 We previously established a pre-clinical model of acute GVHD (4), in which the donor CD4 + T cell responses 88 were targeted to a single, host-derived, allo-reactive peptide presented by donor MHCII molecules -TEa TCR 89 transgenic CD4 + T cells (B6 background) exhibit specificity for a BALB/c-derived Ea peptide from the MHCII 90 molecule, I-E d , when presented by the donor (B6) MHCII molecule, I-A b ( Figure 1A). Here, we exposed 91 BALB/c mice to total body irradiation and provided an MHC-mismatched B6 bone marrow transplant 92 (containing donor T cells). 12-days later, once donor allo-presenting APCs had developed (4), TEa cells were 93 transferred. TEa cell priming occurred specifically within the mesenteric lymph nodes (mLN), and triggered 94 differentiation, as expected by day 4, into subsets with a strong capacity to express pro-inflammatory 95 cytokines IFNγ or IL-17A upon re-stimulation, or in rarer cases (~1%) express Foxp3 ( Figure 1A & B). We 96 also examined the frequency of TEa cells expressing IFNγ, IL-17A or Foxp3 directly ex vivo, without further 97 stimulation ( Figure 1C). Although some TEa cells expressed IFNγ, IL-17A or Foxp3 directly ex vivo ( Figure  98 1C), the majority did not at this early time-point in mLN. Therefore, although T helper cell differentiation had 99 occurred, use of three canonical T helper markers was insufficient for characterising the fate of most TEa 100 cells in mLN during acute gut GVHD. 101 102 Droplet-based, single-cell RNA-seq and computational modelling reveals divergent fates in donor 103 CD4 + T cells. 104 To examine donor CD4 + T cell differentiation without reliance on pre-selected markers, we employed droplet-105 based scRNA-seq. TEa cells were recovered from mLN of irradiated BALB/c alloSCT recipients, at days 1, 106 2, 3 & 4 post-transfer ( Figure 2A and Figure S1A & B) -non-transferred control TEa cells were also examined, 107 referred to as day 0 cells. Flow cytometric assessment of sorted TEa cells -pooled from mice at each time-108 point -revealed as expected rapid, transient upregulation of CD69 and evidence of cell division by CFSE 109 dilution by day 2 ( Figure S1B). This was followed on days 3 and 4 by complete loss of CFSE, indicative of 110 dramatic clonal expansion within days 2-3, as well as substantial upregulation of the gut homing integrin, 111 α4β7 in many, but not all TEa cells. These data confirmed that TEa cell activation had occurred in mLN, and 112 suggested emerging heterogeneity. TEa cells were therefore processed for droplet-based scRNA-seq. After 113 excluding poor-quality single-cell transcriptomes ( Figure S2A & B), we advanced 22,854 high-quality TEa 114 samples for further analysis. We noted a substantial increase in the average number of genes detected per 115 cell from day 0 through day 1 and 2, which dropped by day 4 ( Figure S2A). This was consistent with our 116 previous study in which CD4 + T cells more than doubled the number of expressed genes during clonal 117 expansion (9). 118

Uniform Manifold Approximation and Projection (UMAP) (15) visualisation after Principal Component 119
Analysis (PCA) suggested that day 0 and day 1 TEa cells, whilst relatively homogeneous within their own 120 timepoints, were distinct from each other, and furthermore were distinct from cells assessed at days 2-4 121 ( Figure S2C). In contrast, there was evidence both of transcriptomic overlap between cells from days 2, 3 122 and 4, as well as of transcriptomic progression from day 2 through to day 4 ( Figure S2C). Therefore, we 123 sought to test for potential developmental trajectories within the scRNA-seq data. We employed non-linear 124 probabilistic PCA, termed bGPLVM (9), which embedded the data in low-dimensional space and re-ordered 125 transcriptomes independently of the time-point of capture. Our previous paper had employed bGPLVM on a 126 relatively small number of cells (<1000) (9). It was evident, here, that bGPLVM was scalable to ~25,000 cells.

127
Running bGPLVM iteratively ten times on the dataset yielded similar learned embeddings, indicating the 128 stability of the output ( Figure S3). Variability between cells was clearly observable along three latent 129 variables, with interpretable variation also evident in the first two latent variables, thus permitting modelling 130 in two dimensions ( Figure 2B). We next sought to infer possible differentiation trajectories based on 131 transcriptomic similarity between cells. We employed Slingshot (16) on the bGPLVM embedding with a start 132 point specified in day 1, and with day 0 omitted due to transcriptomic distance from the rest of the data ( Figure  133 S4A & B). Slingshot identified three potential trajectories in the bGPLVM space, two of which (trajectories II  134 & III) progressed through days 2, 3 and 4, while trajectory I appeared to terminate in day 2 ( Figure 2B).

135
Consistent with flow cytometric data ( Figure S1), Cd69 expression gradually reduced along trajectories II &  136 III, while Itgb7, encoding subunit β7 of integrin α4β7, increased (Figure 2C & D). Most strikingly, trajectory III 137 ceased expression of cell-cycling genes including Mki67 (encoding Ki-67)( Figure 2C & D) and exhibited lower 138 expression of aerobic glycolysis genes compared to trajectory II. We also employed Slingshot on UMAP of 139 PCs and found four trajectories, two of which largely superimposed upon each other suggesting, as for 140 bGPLVM, three main trajectories ( Figure S5A). Two of these progressed through days 2-4, with one largely 141 devoid of cellular proliferation, and the other exhibiting strong expression of Mki67 and other cycling genes 142 ( Figure S5B & C). To further test for trajectories in the data, we employed other computational approaches, 143 including Monocle 2, Velocyto (17), and PAGA (18). In all cases ( Figure S6), there appeared one main 144 bifurcation event within days 2-3, leading to two trajectories. Therefore, taken together, our analysis 145 suggested two apparent trajectories had emerged in TEa cells, which differed from each other in expression 146 of genes related to cell cycling and aerobic glycolysis. We next assessed expression of the canonical markers, Ifng, Il17a and Foxp3 within our bGLPVM/Slingshot 151 model ( Figure 3A), and surprisingly, found them strongly expressed in only trajectory II. In addition, we noted 152 significant expression of Il17a and Il17f (Figure 3, Il17f not shown), in areas shared by trajectory I and II 153 ( Figure 3A). These observations pointed to early activation of the Il17a promoter at day 2, followed later by 154 upregulation of Ifng and Foxp3. Most importantly, we saw little evidence of Th1, Th17 and iTreg cell types 155 emerging from different developmental trajectories, either when examined using our previous Th gene 156 modules or when relatively sparse Ifng, Il17a and Foxp3 expression levels were imputed using Adaptively-157 thresholded Low-Rank Approximation (ALRA)(19)( Figure S7). Instead, our unbiased analysis suggested 158 pro-inflammatory and regulatory effector fates had emerged within trajectory II. Importantly, we noted that 159 trajectory III lacked expression of Ifng, Il17a and Foxp3 ( Figure 3A). Differential gene expression analysis 160 between cells in trajectory III vs II revealed Tcf7 as a top transcription factor associated with trajectory III, as 161 well as elevated expression of memory-associated Klf2, Ccr7, Cd27 and Sell (encoding CD62L)( Figure 3B & 162 C and Table S1). These observations were also seen in our UMAP/Slingshot model and in a second scRNA-163 seq experimental repeat ( Figure S8). Thus, given an absence of cell cycle activity, lower expression of 164 aerobic glycolysis genes, absence of pro-inflammatory or regulatory gene expression, and expression of 165 memory or stem-like genes including Tcf7, we inferred that trajectory III contained quiescent TEa cells that 166 had gone through a clonal burst in mLN, but had not acquired any effector function. To test predictions from 167 our transcriptomic models, we assessed mLN TEa cells at day 4 post-transfer by flow cytometry. As expected, 168 all TEa cells had upregulated the Th1-associated lineage transcription factor, T-bet ( Figure 4A). We also 169 observed a clear bifurcation amongst these cells in expression of the Tcf7-encoded protein, TCF1 ( Figure  170 4A), with one population of cells expressing higher levels of TCF1 and lower levels of T-bet than its 171 counterpart ( Figure 4A). We also noted that direct ex vivo expression of IFNγ or IL-17A was substantially 172 reduced in TCF1 hi cells relative to TCF1 lo counterparts within the same mouse. Strikingly, Foxp3 expression 173 was absent in TCF1 hi cells relative to TCF1 lo cells ( Figure 4A), and in vitro re-stimulation did not recover 174 expression of these molecules, ( Figure 4B). Together these data validated our prediction from scRNA-seq 175 modelling of the differentiation of a major population of allo-reactive CD4 + T cells in mLN, which were 176 quiescent and marked by high expression of TCF1. 177 178 Tcf7 hi allo-reactive CD4 + T cells change minimally during migration from mLN to gut. 179 Although TCF1 hi TEa cells within the mLN failed to express Foxp3, or pro-inflammatory cytokines, IFNγ and 180 IL-17A, scRNA-seq predicted their capacity to migrate to the gut due to expression of the integrin gene Itgb7 181 ( Figure 2C). To test this, and to explore the developmental relationships between mLN and gut-migrating 182 TEa cells, we conducted a third scRNA-seq experiment, examining TEa cells at day 5, both in mLN and in 183 the intra-epithelial lymphocyte (IEL) fraction of the gut ( Figure 5A). TEa cells were readily recovered from 184 lamina propria (LP) and IEL fractions from the gut. However, given the longer protocol required to isolate 185 cells from LP versus IEL, and the potential to interfere with transcriptome fidelity, we confined scRNA-seq 186 analysis to IEL TEa cells. To relate day 5 data with our previous mLN dataset, we repeated assessments of 187 days 0 and 4 mLN. Finally, to control for possible technical variation in IEL induced by the isolation protocol, 188 we treated day 5 mLN cells with and without the IEL-isolation pre-digestion buffer. After scRNA-seq, and 189 quality control as before ( Figure  substantially elevated compared to our earlier experiments in mLN, with ~15% and ~40% of TEa cells in 195 Clusters 0, 1 & 3 expressing Il17a or Ifng respectively ( Figure S9C). Pronounced expression also revealed 196 that patterns for Ifng and Il17a expression were not identical in IEL TEa cells. While Ifng was relatively uniform 197 in expression across Clusters 0, 1, & 3, IL17A was focussed in specific areas. However, a clear distinction 198 between Th1 and Th17 could not be drawn transcriptomically, consistent with previous data that allo-reactive 199 CD4 + T cells can co-express IFNγ and IL-17A at protein level. Interestingly, Foxp3 was not tightly confined 200 to a particular area in IEL cells, suggesting that iTregs were transcriptomically varied, and could not be 201 partitioned into a specific lineage separate from those expressing pro-inflammatory cytokines. Together, 202 these data reveal that even after priming in mLN and migration to the gut, allo-reactive effector CD4 + T cells 203 remain transcriptomically similar to each other, regardless of their pro-inflammatory or regulatory phenotype.

204
Unsupervised clustering of IEL TEa cells also revealed Cluster 2, which completely lacked expression of Ifng, 205 Il17a and Foxp3, but was elevated for Tcf7, Ccr7 and Cd27 expression relative to other clusters ( Figure S9C). 206 Cluster 4 contained a small population of cells that could not be reliably analysed ( Figure S9D). Thus, our 207 assessment of TEa cells in the gut revealed the presence of both effector and quiescent cell states, mirroring 208 those seen in mLN. 209 To determine molecular relationships between TEa cells in mLN and gut, we integrated our datasets 210 across all time-points and organs using single-cell Variational Inference (scVI) (20), which accounted for 211 possible batch effects across independent experiments, and provided a temporal atlas of differentiation for 212 allo-reactive CD4 + T cells in the gut ( Figure S10A-B). Output from scVI was first visualised via 3-dimensional 213 UMAP. Day 0 and day 4 mLN TEa cells, regardless of experiment, occupied the same space as their time-214 point counterparts, suggesting that any technical variation from different experiments, sequencing platforms, 215 and protocols had been removed. Next, we noted that day 0 and day 1 cells existed in discrete transcriptional 216 states, separate from each other and from day 2-5 cells ( Figure S10A-B). This suggested TEa cells had 217 undergone rapid and uniform change upon initial exposure to allo-antigen in mLN, with potential intermediate 218 cellular states not captured by scRNA-seq assessment at a single time-point 24 hours after transfer. Similarly, 219 we noted few transcriptomic intermediates between day 1 and day 2, suggesting again further uniform, rapid 220 change during the second 24-hour period. Differential gene expression analysis between consecutive days 221 revealed gene families associated firstly with ribosomal processes, then cellular division were upregulated 222 during the first 48-hour period of allo-antigen exposure (Table S2), consistent with initiation of clonal 223 expansion. Next, we noted from day 2-5 in mLN, a substantial, gradual increase in heterogeneity, with 224 modest effector molecule expression, such as IFNg, IL-17a and Foxp3 expression confined to one space, 225 with quiescent Tcf7 hi cells occupying a separate space ( Figure 5C-D). Importantly, from our integrated model 226 we inferred further transcriptomic change as effector cells migrated from mLN to gut, including for example, 227 upregulation of Csf2 (encoding GM-CSF), Tr1-associated immune-suppressive Il10, and increased 228 expression of IFNg, IL-17A and Foxp3 ( Figure S10C). This suggested differentiation into IFNg + or IL-17A + 229 pro-inflammatory effectors, Foxp3 + iTreg cells or IL-10 + Tr1 cells had not been completed in secondary 230 lymphoid tissue and instead occurred progressively during and after migration to the gut. In contrast, we 231 noted substantial transcriptomic overlap between Tcf7 hi cells in mLN versus IEL. Moreover, once Tcf7 hi cells 232 had emerged in mLN by day 3, their phenotype altered very little either over the following 2 days or during 233 migration to the gut ( Figure 5D). These data suggested that between Tcf7 hi and Tcf7 lo cells by days 4-5 ( Figure 5D), consistent with possible linear transitions from 242 effector to memory. However, at day 2 prior to effector differentiation and clonal expansion, small numbers 243 of cells appeared in the quiescent, low cell-cycling Tcf7 hi population ( Figure 5D), suggesting an alternative 244 mechanism of development unlinked to effector differentiation. Thus, scRNA-seq analysis revealed that some 245 quiescent, Tcf7 hi allo-reactive CD4 + T cells had emerged rapidly within 48 hours of allo-presentation in mLN, 246 with developmental intermediates being difficult to capture in our experimental design. In summary, our 247 integrated atlas of allo-reactive CD4 + T cell differentiation (Movie S1), revealed the emergence of pro-248 inflammatory, regulatory, and quiescent cell states within secondary lymphoid tissue, which emerged rapidly 249 and evolved to differing degrees during migration to the gut. 250

TCF1 hi quiescent allo-reactive CD4 + T cells retain a capacity to mount effector responses in vivo 252
We sought to examine the functional potential of quiescent TCF1 hi TEa cells during acute gut GVHD. In 253 particular, we sought to test the hypothesis that TCF1 hi TEa cells could eventually give rise to an effector 254 response within the gut, and re-generate themselves. Firstly, we noted that although these cells expressed 255 high levels of Pdcd1 (encoding PD-1), they did not express other co-inhibitory markers often associated with 256 T cell exhaustion, suggesting they might still be responsive to re-activation ( Figure S11A). Secondly, as for 257 naïve cells, Tcf7 hi cells expressed high levels of both Il6st and Il6ra, which are required for classical IL-6 258 signalling that promotes CD4 + T cell responses in acute gut GVHD ( Figure S11B). These data suggested 259 that Tcf7 hi cells retained the capacity to respond to allo-antigen. Therefore, we designed a cell-sorting strategy 260 for Tcf7 hi cells based on our bGPLVM/Slingshot scRNA-seq model, in which trajectory III cells expressed 261 much lower levels of IL2ra (encoding CD25) and the Th1-associated chemokine receptor gene, Cxcr6 262 compared with trajectory II effector cells ( Figure S12A). Consistent with this, flow cytometric assessment of 263 day 4 mLN revealed that IFNγ/IL-17A production and Foxp3 expression was confined to CD25 + TCF1 lo TEa 264 cells ( Figure S12B). Hence, we sorted clonally-expanded (CFSE lo ) CD25 -CXCR6 -TEa cells from mLN at 265 day 4 post-transfer, as well as CD25 + and/or CXCR6 + counterparts, and transferred these or naïve TEa 266 control cells into irradiated BALB/c alloSCT recipients ( Figure 6A & Figure S12C). When assessed 4 days 267 later, recipients of TCF1 + TEa cells harboured as many cells as those receiving naïve TEa cells ( Figure 6B).

268
In contrast, recipients of TCF1 -TEa cells, albeit transferred with 50% lower numbers, harboured fewer TEa 269 cells (~10% of original input, compared to 40-60% of input for naïve and TCF1 hi cells). While naïve TEa cells 270 diverged to give rise to both TCF1 hi and TCF1 lo cells, TCF1 + TEa cells were less able to re-generate the 271 TCF1 hi phenotype in a secondary transplant, and TCF1 -TEa cells were almost incapable of doing so ( Figure  272 6C). TCF1 + -derived TEa cells were capable of expressing IFNγ, IL-17A or Foxp3 directly ex vivo and after 273 re-stimulation ( Figure 6D & E), with IL-17A expression increased relative to primary responses by naïve cells.

274
Together, these data indicated that TCF1 + allo-reactive CD4 + T cells, though quiescent during a primary 275 response in mLN, retained a capacity to mount effector responses, but were poor at regenerating the TCF1 hi 276 phenotype a second time. Thus, we reveal during acute gut GVHD a quiescent, gut migratory, allo-reactive 277 CD4 + T cell state that retains effector potential. 278 279

Discussion 280
Although alloSCT is an established curative therapy for a range of haematological malignancies, a 281 major limitation is acute GVHD, where alloreactive naive donor T cells differentiate into pro-inflammatory 282 effectors that damage the GI tract, liver and skin (1). Cytokines such as IL-17A, IFNγ and GM-CSF produced 283 by allo-reactive Th1 and Th17 cells in the GI tract promote disease (21), while IL-10 produced by Tr1 and 284 iTreg cells can protect (22). An important goal in alloSCT is to preserve GVL effects while reducing GVHD. 285 Key to this endeavour is a consideration of the spatial and temporal differences between GVL and GVHD. 286 While GVL exerts beneficial effects in primary and secondary lymphoid organs, acute and lethal GVHD often 287 occurs in the GI tract. By understanding CD4 + T cell differentiation after alloSCT, we may define new 288 strategies to block pathogenic cellular states and encourage protective ones. Although CD4 + T cell 289 differentiation has been explored at genome-scale in infection, auto-immune and allergy models (7-9), 290 extrapolating to alloSCT remains challenging. For example, in the alloSCT setting, alloantigen is ubiquitous 291 and constant, while pathogen-derived antigen may be more dynamic or transient. Secondly, alloSCT often 292 features profound lymphopenia unlike other models. Hence, we specifically examined transcriptome 293 dynamics of T helper cell differentiation in the GI tract of mice after alloSCT. 294 By sampling transcriptomes from thousands of alloreactive CD4 + T cells of a single specificity across 295 lymphoid and nonlymphoid gut-associated tissue, we successfully detected cellular states expressing 296 canonical Th1/Th17 cytokine genes, Ifng and Il17a, and the regulatory genes, Foxp3 and Il10 -at frequencies 297 similar to that observed by flow cytometry. Importantly, however, unbiased clustering and trajectory inference 298 tools suggested substantial similarity between the transcriptomes of these effector subsets, particularly in 299 lymphoid tissue but also in the gut. Subtle differences became more evident amongst gut trafficked TEa cells 300 that had stopped proliferating, with Ifng expressed more uniformly than either Il17a or Foxp3. Given that TEa 301 cells can upregulate T-bet and IFNγ but not IL-17A or Foxp3 in the complete absence of MHCII-presentation 302 (3, 4), our data are broadly consistent with a Th1-like state being a default program in the gut, that may be 303 countered by alloantigen presentation via MHCII towards iTreg or Th17-like states. Nevertheless, our main 304 inference from transcriptome dynamics was that pro-inflammatory states were not readily distinguished from 305 immune-suppressive iTreg states. We propose a model for development of Th1, Th17 or iTreg-like states 306 during alloSCT controlled by specific micro-anatomical, T cell extrinsic factors, such as access to MHCII-307 presentation on different types of APC, or exposure to IL-6 and IL-12-signalling. Given the apparent 308 continued maturation of pro-inflammatory TEa cells as they migrated from mLN to the gut, this raises the 309 question of whether peripheral tissue signals in the gut, such as local cytokine-signalling or unique cell-cell 310 interactions contribute to this process. We speculate that spatial transcriptomic assessment of gut-located 311 TEa cells will shed light on this matter. One further implication of our data is that conversion of emerging pro-312 inflammatory CD4 + T cells into protective, non-pathogenic iTreg cells in the gut may be feasible, since 313 developmental pathways are similar between the two. Conversely, our data also support iTreg-based 314 therapies with appropriate mitigation for the effects of reversion to pro-inflammatory states in vivo (23). 315 We and others have previously studied Th17 biology via fate-mapping using IL-17a promoter-driven, 316 Cre-mediated fluorescent tagging of cells (6, 24, 25). This binary approach is powerful, but does not 317 differentiate between cells that might transiently express Il17a compared with those exhibiting prolonged 318 expression. In our scVI transcriptomic model, we noted early transient expression of Il17a and Il17f at day 319 2, which disappeared only to re-appear in some cells at day 5 in the gut. This raises the possibility that CD4 + 320 T cells can transiently express IL-17a in mLN without becoming bona fide Th17 cells. Hence, we propose 321 that studies employing permanent marking of cells using Cre-LoxP systems should be interpreted with 322 possible transient expression taken into account. 323 Our unbiased, single-cell genomic approach identified an unexpected, apparent trajectory 324 characterised by TCF1 hi expression, rapid shutdown of cellular proliferation, a lack of pro-inflammatory or 325 immune-regulatory gene expression, an ability to migrate to the gut, and a capacity to mount a secondary 326 recall response despite modest expression of select checkpoint blockade molecules. Based on these 327 observations we infer TCF1 hi TEa cells to be generally quiescent, memory or stem-like cells that emerged 328 very rapidly during alloSCT. Transcriptomic modelling suggested that Tcf7 hi cells could under certain 329 circumstances arise from the cytokine-expressing effector lineage at day 3 or 4, which would be consistent 330 with a linear model in which effector cells give rise directly to memory-like cells (26,27). However, we also 331 noted rare instances of Tcf7 hi cells emerging at day 2 post-transfer, as clonal expansion was beginning and 332 effector differentiation had yet to occur. We did not detect transcriptomic intermediates between this distal 333 state and more naïve cells, either because such states do not exist, or because our study was not designed 334 to detect such rare, transient events. Given that trajectory inference from scRNA-seq data tends to rely upon, 335 indeed assume, gradual transcriptomic change, it is likely that very rapid state changes cannot be mapped 336 using this approach. Our data does not resolve the extent to which TCF1 hi CD4 + T cells emerge via gradual 337 linear effector-memory transition versus a more rapid, branching process via asymmetric cell division, which 338 was recently reported for similar cells during a respiratory virus infection model (28), and for CD8 + T-cells 339 during LCMV infection (12). We speculate, given the presence of rare "pioneer"-like Tcf7 hi cells as well as 340 apparent transcriptomic intermediates, that both these developmental pathways may operate during alloSCT. 341 New research tools are required to quantify the relative use of these mechanisms in vivo. 342 One question to consider from our transcriptomic modelling is how separate states, loosely referred 343 to as "effector" or "quiescent", emerge amongst clonal T cells during alloSCT. Heterogeneity amongst clonal 344 TEa cells could have been induced via various non-mutually exclusive mechanisms, including asymmetric 345 cell division (28), differential APC engagement and differential access to early local cytokine signals. Given 346 the stark difference in CD25 expression between trajectories, it appears feasible that IL-2-signalling promotes 347 effector function at the expense of quiescence. A recent study revealed a reciprocal relationship in production 348 and receipt of IL-2 controlled fate bifurcation in CD4 + T cells (29). Hence, we speculate that a similar 349 mechanism might be acting during alloSCT. In our model, we previously reported that colonic CD103 + 350 dendritic cells played a crucial role in amplifying acute GVHD (4). It is possible that naïve CD4 + T cells that 351 failed to access these APC may have been programmed towards the quiescent, TCF1 hi state. As part of this 352 scenario, given that our model is characterised by profound lymphopenia, it is possible that homeostatic 353 proliferation, perhaps via IL-7 signalling, might have partly contributed to the proliferation and stabilisation of 354 a quiescent cellular state. However, given that donor allo-antigen presentation via MHCII is important for 355 supporting clonal expansion in this model, exposure to diverse donor APC may contribute to heterogeneity 356 in CD4 + T cell differentiation. 357 Our experiments revealed that clonally expanded TCF1 hi CD4 + T cells, which had exhibited no effector 358 function and had shut down cellular proliferation, were capable of mounting a pro-inflammatory or regulatory 359 response at a later date, as shown during secondary transplantation. This suggests that after alloSCT, the 360 gut could be populated with quiescent alloreactive T cells that could influence disease outcome at a later 361 timepoint. Despite their capacity to mount either pro-inflammatory or regulatory responses, we view these 362 cells as an opportunity to lodge immune-suppressive, possibly tissue-resident CD4 + T cells within the gut that 363 could contribute to disease prevention after alloSCT. However, the potential for these to produce pathogenic 364 cytokines would clearly need to be addressed. In summary, our examination of transcriptome dynamics 365 during alloSCT not only highlighted developmental relationships between effector CD4 + T cells, but also 366 uncovered the existence of a quiescent, memory-like state that exhibits functional potential in vivo. It will be 367 important next to interrogate our experimentally-derived observations in patient samples after alloSCT. 368

392
Red blood cells were then lysed using RBC Lysing Buffer Hybri-Max™ (Sigma-Aldrich). Splenocytes were 393 then washed with RPMI media (Gibco) prior to proceeding with staining for flow cytometry assessment. 394

Small intestine and colon 395
Intraepithelial lymphocytes were isolated from the small intestine and colon of mice using the Lamina Propria -PB, rat IgG2a isotype control -PE, rat IgG1k isotype control -PECy7, rat IgG2bk isotype control -APC 406 (all Biolegend). For assessment of intracellular cytokine production and transcription factor expression, cells 407 were incubated with or without ionomycin (500 ng/ml) and PMA (50 ng/ml) for 4 hours at 37°C, brefeldin-A 408 (5 µg/ml) was added to cells after 1 hour of incubation. Intracellular staining was then performed using the 409 eBioscience™ Foxp3/Transcription Factor Staining Buffer Set with the following antibodies: IFNγ (XMG1.2) 410 -BV421, IL-17A (TC11-18H10.1) -BV605, rat IgG1k isotype control -BV421, rat IgG1k isotype control -411 BV605, rat IgG2a isotype -AF700, mouse IgG1k isotype control -APC from Biolegend; Tcf1 (C63D9) -PE, 412 rabbit IgG isotype control -PE from Cell Signaling Technology; Tbet (4B10) -APC, Foxp3 (FJK-16S) -413 AF700 from eBioscience. Samples were acquired on a LSRII Fortessa Analyser (BD Biosciences) and data 414 analysed using FlowJo software (Treestar).  Highly variable genes (HVGs) were identified using "FindVariableGene" function from Seurat with default 446 parameters. For each subset of data used for dimensionality reduction, HVGs were computed individually 447 and used as an input, unless otherwise specified. Number of HVGs used in each analysis is noted in the 448 figure legends. 449 Principal component analysis (PCA) dimensionality reduction was performed using "RunPCA" function from 450 Seurat and the computed PCs were used to generate uniform manifold approximation and projection (UMAP) 451 of scRNA-seq data using "RunUMAP" function from Seurat. Number of PCs used in each analysis is noted 452 in the figure legends. 453 Bayesian Gaussian Processes Latent Variable Modelling (bGPLVM) dimensionality reduction was performed 454 using GPfates v1.0.0 (9) where datasets containing all genes after initial gene filtering step were used as 455 input. Up to five latent variables were considered. 456 Integrated dimensionality reduction of mGVHD2 and mGVHD3 datasets was performed using scVI v0.4.1 457 (20). Each experiment was identified as separate batch. All parameters were kept at default except: up to 30 458 latent variables were considered, 2 hidden layers were used for encoder and decoder neural networks and 459 up to 100 epochs were used to train the model. The computed latent variables were used as an input to 460 generate UMAP using "RunUMAP" from Seurat, or using the umap-learn v0.3.5 python package (15). Trajectories were inferred through the mGVHD2 dataset using Slingshot v0.99.12 (16). Slingshot requires 469 clusters of cells and embeddings for those cells as input. We used bGPLVM latent variables 1 and 2, and 470 also performed unbiased clustering based on these variables. A semi-supervised approach was taken 471 whereby a cluster with high proportion of day 1 cells were specified as a starting point ( Figure S4A). 472 473

Monocle 474
Trajectories were inferred through the mGVHD2 dataset using Monocle v2.8.0 (32). PCA dimensionality 475 reduction was performed and the first 10 PCs were used as an input for unsupervised clustering using 476 "plot_pc_variance_explained" and "clusterCells" functions, where number of clusters was specified (n=6).

477
Differential gene expression analysis was performed between clusters using "differentialGeneTest" function 478 and the list of significant genes (qval < 0.01) was used as an input to order the cells using "orderCells" 479 function. 480 481 PAGA 482 Trajectories were inferred through mGVHD2 dataset using SCANPY v1.4.4, which includes the PAGA 483 trajectory inference algorithm (18). PCA dimensionality reduction was performed using the "scanpy.tl.pca" 484 function with ARPACK SVD solver to aid computation. A neighbourhood graph was computed via 485 "scanpy.pp.neighbors" with the first 10 PCs and the size of local neighbourhood specified as 30. 486 Unsupervised clustering of cells was performed using "scanpy.tl.louvain" function with resolution 1.0. Finally, 487 coarse-grained connectivity structures connecting the computed clusters of cells was mapped using 488 "scanpy.tl.paga" with default parameters. and (3) were performed using the integrated mGVHD2 and mGVHD3 datasets as input. Genes from (2) and 511 (3) with Bonferroni adjusted P-value below 0.01 and average log fold change greater than 0.5 were 512 considered as input for gene ontology (GO) term enrichment analysis. GO     TEa (or control 0.8 x 10 6 Naïve TEa) were transferred into recipient BALB/c mice that had received total body 705 irradiation 13 days prior and a bone marrow transplant 12 days prior. FACS assessment of mLN TEa cells 706 was performed on day 4 or day 5 post-secondary transfer. 707 (B) Absolute numbers of TEa cells in the mLN at day 4/5 post-secondary transfer. Arrows along y-axis denote 708 the number of TEa cells that were transferred per mouse for each group on day 0 (0.8 x 10 6 Naïve TEa (grey) 709 and CD25 -CXCR6 -TEa (turquoise), 0.4 x 10 6 CD25 + CXCR6 + TEa for (purple)). CD25 + CXCR6 + n = 6) or from one experiment only (E; Naïve n = 6, CD25 -CXCR6 -n= 4). Data are 717 represented as median (B) or mean ± SEM (C -E). Statistical analysis was performed using one-way ANOVA 718 with multiple comparisons (B), Paired t test (C), or a Mann-Whitney test (D, E). *p<0.05, **p≤0.01, ***p≤0.001, 719 ****p≤0.0001. 720