Single-cell immune repertoire tracing identifies rituximab refractory B cells that emerge during relapse

Rituximab, an anti-CD20 B cell-depleting therapy, is indicated to treat a growing number of autoimmune disorders. However, disease relapses can occur when the B cell compartment is re-established after treatment. To explore the mechanism of relapse, we studied patients with muscle-specific kinase (MuSK) myasthenia gravis (MG), a paradigm for B cell-mediated autoimmune diseases in which pathology is directly associated with autoantibody production. Whether the failed depletion of specific clones leads to the persistence of autoantibody producing B cell subsets has yet to be elucidated. Here we show that rituximab (RTX) therapy is not fully effective at eliminating disease-associated B cells and provide a mechanistic understanding of relapse. We carried out single-cell transcriptional and B cell receptor (BCR) profiling on longitudinal B cell samples from the peripheral blood of MuSK MG patients who relapsed after rituximab therapy. Computational analysis of these data identified individual B cells that were refractory to rituximab depletion by linking them to clones that were present prior to therapy and continued to persist after therapy. These persistent B cell clones were present at high frequency, and included both antibody-secreting cells and memory B cells. They were disproportionately isotype switched with elevated somatic hypermutation frequencies and a gene expression signature associated with antigen-experience and B cell clonal expansion. We further identified both antibody-secreting cells and memory B cells with specificity for MuSK autoantigen that were clonally expanded during relapse and persistent. Our results provide evidence that post-rituximab relapse in MuSK MG is associated with the failed depletion of autoantigen-specific B cell subsets. This work provides insight into the durability of human B cells during treatment with therapeutic CD20-specific monoclonal antibodies. The transcriptional signatures associated with B cell resistance may represent new therapeutic avenues for treatment and biomarkers of depletion efficacy.


Introduction
Anti-CD20 mediated B cell depletion therapy (BCDT) is indicated for the treatment of multiple autoimmune disorders including rheumatoid arthritis (RA), autoimmune-associated vasculitides (AAV) and multiple sclerosis (MS) (1)(2)(3). Acquired Myasthenia gravis (MG) is a chronic autoimmune disorder caused by pathogenic autoantibodies that bind to membrane proteins at the neuromuscular junction leading to muscle weakness. While the majority of MG patients have serum autoantibodies against the acetylcholine receptor (AChR), a subset of patients have autoantibodies against muscle-specific tyrosine kinase (MuSK). MuSK is essential for AChR clustering and synaptic differentiation. Through binding MuSK, serum-derived autoantibodies impair clustering of AChRs, and block neuromuscular transmission causing disease (4). The conspicuous clinical benefit of BCDT in patients with MuSK MG was recently demonstrated (5)(6)(7). However, a fraction of patients treated with the anti-CD20 BCDT agent rituximab (RTX) experience post-rituximab (post-RTX) relapses despite symptomatic responses and a fall in anti-MuSK antibody titer (5,(8)(9)(10).
Post-RTX relapse has been reported in the context of RA, AAV and systemic lupus erythematosus (SLE), and is typically managed with additional BCDT (11)(12)(13). An elevated fraction of antigen-experienced B cell subsets (including memory B cells and plasmablasts) after depletion has been correlated with the occurrence of post-RTX relapse (14)(15)(16)(17)(18)(19). Post-RTX relapse in MuSK MG was associated with the presence of circulating plasmablast expansions, which include those secreting MuSK autoantibodies (20,21).
Previous studies provide some insight into the possible origin of circulating antigen-experienced B cells during post-RTX relapse. Antigen-experienced plasma cells or plasmablasts are known to be resistant to BCDT owing to their low CD20 expression (22)(23)(24)(25). However, memory B cells, which express CD20, are abundant immediately after BCDT, and their frequency, as a fraction of total B cells, is elevated in patients with RA who experienced relapses after RTX. (15,16). Whether or not plasmablasts, plasma cells or memory B cells emerge from inadequately depleted circulating B cells (versus emerging entirely de novo) is not known.
The characteristics of antigen-experienced B cells that re-emerge during post-RTX relapse also remain unclear. BAFF (also known as B cell activating factor, B lymphocyte stimulator or BlyS) is a TNF superfamily member produced almost exclusively by myeloid cells (26,27). BAFF (and another TNF superfamily ligand, APRIL) can bind a family of B cell specific receptors to promote B cell survival. For example, levels of receptors for BAFF/APRIL on B cells such as TACI, BAFF-R and BCMA correlate with poor clinical responses to BCDT but whether these receptors are expressed by persistent B cells or B cells emerging de novo has not been investigated (28)(29)(30). Clonal expansions have also been correlated with poor responses to RTX, but whether this is because some B cell subsets incompletely depleted by RTX happen to also be clonally expanded remains an open question (31,32).
Understanding the factors contributing to post-RTX relapse is important for the development of new modalities of monitoring depletion and more effective therapy combinations for treating MG and other autoimmune disorders. Thus, we sought to test the hypothesis that B cell clones resistant to RTX are disease relevant and preferentially clonally expanded post-RTX. We also sought to test the hypothesis that these B cells are associated with plasmablasts and memory B cells that differentially express CD20 and APRIL/BAFF receptor genes. To this end, we used adaptive immune receptor repertoire sequencing (AIRR-Seq) using next-generation sequencing (NGS) and single-cell gene expression analysis to trace B cell populations temporally and to characterize B cells that emerge during post-RTX relapse. In comparison to the examination of overlap between bulk BCR repertoires alone, this approach allows for the characterization of the expression profile of persistent, disease-relevant B cells (including those specific for antigen) at single-cell resolution. Moreover, this approach allows for the identification of genes representing novel therapeutic targets, is unbiased, and not limited to the set of known B cell surface markers used for characterizing B cell subsets. Finally, it can also be used to remove possible confounders related to B cell subset distribution that could affect the identification of repertoire features associated with persistence. For instance, factors such as clonality, isotype usage and SHM frequency associated with persistence may actually be features associated with persistent subsets such as plasmablasts.
NGS approaches for generating immune repertoires commonly begin with genomic DNA or mRNA and produce high-depth bulk repertoires of unpaired immunoglobulin or T cell receptor V(D)J sequences (33)(34)(35). Recent approaches use single-cell isolation techniques with highthroughput sequencing to construct repertoires, sometimes with paired gene expression (36)(37)(38)(39). These include studies on tumor-infiltrating lymphocytes from cancer biopsies, infiltrates from the brain of mice and B cells from the spleen after vaccination in mice; no such studies have investigated the circulating B cell repertoire in patients with autoimmune disease thusfar (37)(38)(39). Previous immune repertoire studies of BCDT have identified persistent expanded clones and associations between clonality and responder status, and single-cell transcriptome studies have recapitulated flow cytometry results showing increased frequencies of antigen-experienced B cells at the peak of depletion (32,(40)(41)(42). In this study, we combine both approaches to leverage depth and single-cell resolution to investigate the basis for differential B cell depletion by RTX in an unbiased, high-throughput manner. This approach, which we refer to as Single-cell Tracing of Adaptive Immune Receptor repertoires (STAIR), allows for unbiased transcriptional characterization of B cells resistant to BCDT.
We demonstrated that circulating memory B cells and post-RTX plasmablasts which produce MuSK-specific antibodies emerge from the incomplete depletion of clonally expanded pre-RTX clones. We also showed that plasmablasts were more persistent than memory B cells. Memory B cells associated with persistence were characterized by low CD20 expression but also low BAFF-R expression and high TACI expression. Clonal expansion was also found to be associated with persistence among memory B cells. These results provide new insights into the mechanism of post-RTX relapse in patients with MuSK MG and potential avenues for improving the treatment of MG and other B cell mediated autoimmune diseases.

A subset of B cell clones is refractory to depletion
We sought to identify clones that escaped RTX depletion and thus may contribute to relapse that occurs when the B cell compartment repopulates after treatment in patients with MG ( fig. 1). To that end, we sequenced the B cell repertoire from three MG patients before and after treatment (table S1, S2). The number of V(D)J sequences isolated for patient 1 was 639,715 V(D)J; 82,720 from patient 2; and 107,504 from patient 3. Grouping the sequences into clones-those with a common ancestor based on their V, J gene usage and junction sequences (43)-generated 187,195; 34,719; and 14,579 clones for patient 1, 2 and 3 respectively. A total of 2,462 clones were found with members that were present in both the pre-rituximab (pre-RTX) and postrituximab relapse (post-RTX) repertoire (termed persistent clones) in patient 1 (or 1.6% of pre-RTX clones); 902 (4.9%) and 345 (7.6%) were found in patients 2 and 3 respectively ( fig. 2A). Given that clonal relationships are inferred computationally, we next sought to substantiate that these persistent clones were not identified by sequence similarity that happened by chance. Because clones cannot be shared across multiple individuals, the average number of clones shared between patients was calculated e.g. the average number of shared clones between patient 1 and 2, and patient 1 and 3 was calculated. Patient 1 shared an average of 0.4% clones in comparisons with other patients, patient 2 shared an average of 1.4%, and patient 3 shared an average of 0.9%. Quantification using a clonal overlap metric demonstrated that the observed overlaps between pre-RTX and post-RTX repertoires were significantly higher relative to this background (t-test P=0.037) ( fig. 2B). These collective data demonstrate that circulating B cell clones persist despite RTX-mediated depletion of the repertoire.

Persistent clones are antigen experienced
Next we investigated if persistent clones that emerge post-RTX have distinct characteristics compared to non-persistent clones. Given the possible role of persistent B cells in recognizing antigen, we characterized features of the repertoire associated with antigen experience, namely isotype switching or elevated somatic hypermutation (SHM) frequency. We quantified the difference in isotype frequency among V(D)J sequences from persistent and non-persistent post-RTX clones. V(D)J sequences from persistent clones were observed to be mostly isotype switched. While non-persistent sequences were 43.2% IgM, 6.4% IgD, 24.7% IgG and 25.5% IgA of the repertoire on average ( fig. 2C), persistent ones were disproportionately switched (15.0% IgM, 2.2% IgD, 27.4% IgG and 55.0% IgA) (ANOVA p=0.027). In addition, the overall SHM frequency of persistent clones was significantly elevated (fig. 2D) (ANOVA P=0.014) independent of isotype (ANOVA p = 0.287). Therefore, persistent clones are preferentially isotype switched with elevated SHM, reflecting a higher frequency of antigen-experienced clones.

Persistent clones do not consistently expand or gain SHM
We next sought to determine whether antigen experience was gained during reconstitution of the B cell compartment after RTX treatment. We first tested if persistent clones expand between pre-RTX and post-RTX time points. Clonal expansion was assessed by computing the change in size of persistent clones between pre-RTX and post-RTX time points. Average decreases of -11% (95% CI -93% to +1313%), -85% (95% CI -98% to +439%) and -7% (95% CI -99% to +40.5%) in clonal size respectively were observed across patients 1, 2 and 3 (one-sample t-test P=0.37) ( fig. S3A-C). Thus, decreases in clonal size were observed but these decreases were not significant. Given that persistent clones have elevated frequencies of SHM, we tested whether individual clones acquire additional SHM by quantifying the average change in SHM frequency of persistent clones between pre-RTX and post-RTX time points. The median change in SHM frequency was -0.2% (95% CI -89% to +1132%), -2.4% (95% CI -74% to 187%) and +4.5% (95% CI -41% to +67%) for patients 1, 2 and 3 respectively ( fig. S3D-F). Thus, we did not detect a consistent change in the SHM frequency of persistent clones between pre-RTX and post-RTX samples for these patients (one-sample t-test P=0.807). These collective findings suggest that clonal expansion and the acquisition of antigen experience between pre-RTX and post-RTX time points is not a general feature of persistent clones.

Persistent clones are disease relevant and bind MuSK
We next sought to investigate if persistent B cell clones were disease-relevant. In two previous studies, using the same three patients described here, we identified B cell clones during post-RTX relapse that had disease relevance (table S3). In the first, 43 B cell clones were used to generate recombinant monoclonal antibodies, all of which were derived from circulating plasmablasts as this cell type was demonstrated to produce MuSK-specific antibodies when cultured in vitro (20). In the first, 43 B cell clones were used to generate recombinant monoclonal antibodies, all of which were derived from circulating plasmablasts as this cell type was demonstrated to produce MuSK-specific antibodies when cultured in vitro (21). In the second, a fluorescent MuSK antigen bait was used to both identify and isolate MuSK-specific B cell clones from which an additional 38 recombinant monoclonal antibodies were produced. Members of these collective 81 clones bound surface expressed MuSK using a cell-based assay across a range of different binding strengths with many of these observed to be weak binders (but some were strong binders) (20,21). Of these 81 clones, 27 could be found in our post-RTX repertoires. This demonstrates that they were clonally expanded, as they were found in independent samples. When we tested if members of these 81 clones were present in pre-RTX repertoires we observed that seven clones could be found in a pre-RTX repertoire; pre-RTX relatives for the other 74 clones were not observed. Specifically, we observed that one clone from patient 1 could be found in a pre-RTX repertoire, while six clones from patient 3 were also found in a pre-RTX repertoire ( fig. 3, fig. S4A-F). One of the six clones from patient 3, which were present in both the pre-and post-RTX repertoire, demonstrated modest MuSK binding activity ( fig. 3). Thus, disease-relevant B cells present during relapse can derive from the failed depletion of B cell clones that existed prior to therapy.

Antibody-secreting cell-like B cells are preferentially persistent but memory B cells also persist
Given the disease relevance of persistent antigen-experienced B cells during relapse, we next sought to characterize the transcriptional state of persistent B cells in an unbiased manner to investigate how different B cell subsets are represented among persistent populations. To accomplish this, we carried out single-cell transcriptome and paired BCR repertoire profiling of enriched circulating post-RTX B cells. Owing to the high frequency of transitional B cells during post-RTX reconstitution, IgD low B cells were sorted for prior to single-cell analysis ( fig. S5). The analysis was limited to the switched IgG and IgA repertoire by computationally filtering out cells with V(D)J's paired with an IgD or IgM constant region. Clones were then identified in singlecell transcriptome and repertoire data sets that were related to clones present in pre-RTX bulk BCR repertoires. This allowed for direct interrogation of the transcriptomes of persistent B cells. That is, we leveraged the resolution offered by high-throughput single-cell transcriptomes and depth offered by bulk BCR analysis to perform single-cell tracing of adaptive immune repertoires (STAIR). From single-cell sequencing, a total of 18,133 cells associated with 14,671 IGH V(D)J sequences were identified (table S4) from the same three post-RTX patients and an additional asymptomatic AChR MG patient used as a control. Gene expression patterns were used to partition the cells into 11 distinct clusters ( fig. S6A-E, table S5). Clusters of antibodysecreting cell-like (ASC), memory B cells, mature naive B cells and transitional B cells were assigned by analysis of the gene expression of these clusters ( fig. 4A). Because B cells from the single-cell expression analysis could not be experimentally isolated to assess for antibody secretion or other functional features associated with plasmablasts/plasma cells, we chose to label the cluster most associated with a plasmablast/plasma cell signature as antibody-secreting cell-like (hereafter referred to as ASC). By combining single-cell repertoires with bulk pre-RTX BCR repertoires, a total of 820 persistent clones with members in the single-cell profiling data were identified. Persistent clones could be identified in both the ASC as well as memory B cell clusters. The ASC cluster had a higher frequency (27.9%) of persistent clones in comparison to the memory cluster (14.6%) ( fig. 4B, 4C) (t-test P=0.045). The persistence of both ASC and memory cells was not due to the mis-identification of clones, as IGH sequences were also identified from single-cells in the ASC and memory B cell cluster with identical V(D)J sequence as pre-RTX members ( fig. 4B, red dots). Thus, while both ASCs and memory B cells are persistent, members of the ASC subset are more likely to be persistent than memory B cells during post-RTX relapse.

ASCs have persistence-associated features compared to memory B cells
ASCs are disease-relevant in MG and other autoantibody-mediated diseases. Moreover, we previously demonstrated their direct role in MuSK autoantibody production (20,21). We next sought to investigate phenotypic characteristics of ASCs compared to memory B cells during post-RTX relapse to further understand their mechanism of persistence and identify possible subsets of persistent ASCs. To correlate the persistence of ASCs with their expression of CD20 (the target of RTX), CD20 expression was examined among B cell subsets and the ASC cluster was observed to express lower CD20 levels when compared to memory B cells (t-test P=0.021) ( fig. 5A). However, when we examined the expression of CD20 of persistent compared to nonpersistent ASCs, a decrease was observed but this was not significant (t-test P=0.15) ( fig. 5B). This finding was also observed when APRIL/BAFF receptor genes (TACI, BAFF-R and BCMA) were examined (P=0.085, 0.555, 0.288 for TACI, BAFF-R, BCMA respectively). Persistent ASCs were also not observed to differ in terms of isotype usage (ANOVA P=0.194) or frequency of SHM (ANOVA P=0.101) compared to their non-persistent counterparts ( fig. S7A, B). The clonal diversity (more clonal expansions) of the ASC cluster was lower compared to memory B cells (t-test P=0.011) ( fig. 5C). However, when we examined clonal diversity within the ASC cluster, a consistent difference between persistent compared to non-persistent ASCs was not observed (t-test P=0.079) ( fig. 5D). The fraction of cells in the ASC cluster was also observed to be 10.0% of the total repertoire in the post-RTX cohort ( fig. 5E). This was significantly elevated compared to the 0.5% observed in the control asymptomatic AChR MG sample (one-sample ttest P=0.044). However, when we examined if clonal expansion of persistent B cells between the pre-RTX and post-RTX time points could explain the abundance of ASC B cells that we observed, only some ASC B cell clones were found to be expanded as a fraction of the repertoire between pre-RTX and post-RTX time points; the average change in clonal size was not consistent across patients (t-test P=0.099) ( fig. S8A). Collectively these data identify features of the ASC cluster as a whole compared to memory B cells that may explain their higher frequency among persistent clones: ASC cluster members expressed lower CD20 and were more clonally expanded. However, persistent ASC cluster members were not observed to differ from nonpersistent members in terms of their expression of CD20 and receptors for BAFF/APRIL, clonal diversity, isotype usage or SHM frequency.

Persistent memory B cell clones are expanded and express low levels of CD20
Given the high frequency of persistent memory B cells observed in the data, we next sought to explore if persistent memory B cells had distinct features compared to non-persistent memory B cells. Persistent memory B cells were observed to have significantly lower CD20 expression (ttest P=0.022) and higher expression of TACI (t-test P=0.037), but lower expression of BAFF-R (t-test P=0.037) ( fig. 6A). Analysis of SHM and isotype usage revealed that persistent memory B cells also had higher frequencies of SHM across isotypes (ANOVA P=0.037) ( fig. 6B); the fraction of persistent memory B cells expressing IgG1 (t-test P=0.022) and IgG3 (t-test P=0.025) (ANOVA P= 0.007) was less than that of their non-persistent counterparts ( fig. 6C). We examined if persistent memory B cells were clonally expanded compared to their non-persistent counterparts. They were observed to have lower clonal Unbiased differential gene expression identified 1,282 differentially expressed genes between this sub-cluster and all other memory B cell sub-clusters (data file S1) (FDR < 0.05). The upregulated subset of these genes was enriched for a signature for MYC (or c-Myc, a transcription factor that enhances B cell survival) and BRCA1/p53 (transcription factors that regulate cell senescence and cell cycle) (fig. S11B). The expression of two tissue homing genes that are the targets of the immunotherapy agents, natalizumab and fingolimod (ITGA4 and S1PR1 respectively), were also observed to be upregulated among persistent memory B cells (P=0.049 for ITGA4 and P=0.039 for S1PR1) (data file S1). Thus, persistent memory B cells are clonally expanded and are characterized by a subset of B cells with elevated SHM, high expression of TACI, low expression of CD20 and a gene expression pattern associated with transcription factors that control cell survival, proliferation and senescence.

Discussion
RTX is a B cell depleting agent widely used for the treatment of autoimmunity, however relapses after treatment have been reported in the context of diseases such as SLE, RA and AAVs (12,13,44,45). MG is a chronic autoimmune disorder characterized by muscle weakness and caused by pathogenic autoantibodies that bind to membrane proteins at the neuromuscular junction. Pathogenic autoantibodies to muscle-specific tyrosine kinase (MuSK) can be found in patients with MG who do not have detectable antibodies to the acetylcholine receptor. Recent studies have demonstrated the remarkable benefits of RTX-mediated B cell depletion therapy in MuSK MG (5,6,46). In addition to diminished autoantibody titers and significant clinical improvement, RTX also allowed for tapering and subsequent discontinuation of other immunotherapies in subsets of MG patients. However, post-RTX relapses do occur (5,47). We recently demonstrated that relapses after RTX in MuSK MG patients are associated with the presence of circulating plasmablasts that express MuSK autoantibodies (20,21). We sought to investigate if these disease-relevant and autoantigen-binding B cells derive either from the failed depletion of pre-existing or de novo B cell clones, and we further explored the characteristics of antigen-experienced B cells that emerged following B cell depletion in these patients.
In this study, we show that B cell clones relevant to disease, including autoantibody-secreting B cells present during relapse, can derive from the failed depletion of pre-existing B cell clones. Moreover, by directly tracing individual persistent B cell clones, we also show that persistent B cells use more isotype-switched constant regions, have elevated frequencies of SHM, and are clonally expanded. These are all features of an antigen-driven B cell response. We used a novel approach combining multiple methods of sequencing the repertoire with single-cell gene expression (STAIR) to demonstrate that persistent B cell clones are comprised of both expanded ASCs as well as memory B cell populations expressing low levels of CD20 associated with a unique gene expression profile. Collectively, these data directly show that circulating ASC and memory B cells, including disease-relevant B cells, are incompletely depleted by RTX in a subset of MuSK MG patients who experienced post-RTX relapses. We also characterized the repertoire and transcriptional features of these persistent ASCs and memory B cells in an unbiased fashion.
Our finding that circulating ASCs are persistent is consistent with a large body of evidence showing that ASCs, particularly those expressing lower levels of CD20, resist depletion owing to their CD20 expression and tissue localization (22,24,25,48). ASC frequency post-RTX is also known to be a reliable predictor of treatment response in SLE and RA patients treated with RTX (11,(49)(50)(51). By using the repertoire to directly trace and quantify the frequency of these ASCs, we show that many ASCs that reconstitute the B cell compartment after therapy come from preexisting, circulating B cell populations detectable prior to depletion.
The finding that circulating-memory B cell clones are also persistent is consistent with data showing that the frequency of memory populations after depletion predicts poor responses to RTX in SLE and RA (15)(16)(17). While these studies were important for identifying outcome predictions, they were not designed to resolve B-cell subsets due to limitations inherent with flow cytometry. By comparison, our definition of memory using single-cell transcriptomes does not rely on flow cytometry markers, but rather thousands of markers that allow us to be confident these cells share common gene expression patterns associated with memory B cells. Furthermore, we were able to profile persistent memory B cell subsets in detail. For instance, a recent study using bulk AIRR-Seq to investigate the use of RTX for treating AAV, demonstrated that persistent clonotypes are isotype-switched which we also observed in our study (52). However, because this analysis was not combined with single cell gene expression data, it is not clear if this was simply because the associated subsets that use these isotypes are more persistent (52). In this study, persistence is correlated with a specific gene expression signature e.g. lower CD20 expression and differences in APRIL/BAFF receptor expression relative to other memory B cells.
Studies of RTX responses in anti-MAG-neuropathy (Immunoglobulin M(IgM) Anti-Myelin Associated Glycoprotein Peripheral Neuropathy) and pemphigus vulgaris (PV) (17,19,31) demonstrated that poor responses to RTX are associated with increased repertoire clonality post-RTX. Our results show that a large fraction of memory B cells is persistent, and that these persistent memory B cells are more clonally expanded than non-persistent memory B cells. These findings suggest that the association between increased repertoire clonality and poor responses to RTX may result from the association between increased clonality and the persistence of memory B cell subsets.
Our work is congruent with studies that have shown that BAFF and the differential expression of receptors for APRIL/BAFF (TACI, BAFF-R, BCMA) on antigen-experienced B cells may lead to poor outcomes after BCDT (28)(29)(30)53). A higher frequency of memory B cells expressing lower BAFF-R can be detected in patients with RA relapsing after RTX (30) as well as in patients with GVHD who respond poorly to RTX (29). This is consistent with our observation that persistent memory B cells express lower BAFF-R levels. BAFF levels have been shown to negatively correlate with BAFF-R expression after RTX therapy in primary Sjogren's Syndrome and SLE (53). Moreover, extended exposure of B cells to soluble BAFF decreases surface levels of BAFF-R in vitro. By comparison, exposure to anti-BAFF agents has been shown to increase the expression of BAFF-R on B cells (54). By directly characterizing the features of B cell clones that resist RTX depletion we are able to show that memory B cell subsets that persist after RTX express surface markers are associated with poor responses to RTX during relapse. These expression changes may all reflect the underlying effectiveness of B cell depletion.
Our analysis of bulk BCR sequencing showed that the largest fraction of persistent B cells was switched to the IgA isotype. STAIR analysis showed that two subsets observed to be most persistent, ASCs and memory B cells (particularly cluster 3) were also predominantly IgA. We speculate that the abundance of IgA expressing B cells could be related to BAFF expression given the role of BAFF (and receptors like TACI) in inducing class switch recombination, particularly in the generation of gut resident IgA-switched B cells (55)(56)(57). The presence of IgAswitched circulating ASCs has been shown to be dependent on BAFF (58). Moreover, IgA secreting plasma cells were shown as beneficial in an experimental autoimmune encephalomyelitis (EAE) mouse model, owing to their production of Il-10. Thus, further characterization of how the induction of IgA producing B cell subsets by RTX may be involved in active disease may be warranted to account for alternative mechanisms of therapeutic benefit from BCDT.
The conclusions of this study offer insights into possible therapeutic and diagnostic avenues for the management of patients with autoimmunity who experience relapses after RTX. Our findings regarding the BAFF/APRIL system suggest that combining RTX with BAFF pathway inhibitors (e.g. belimumab) could deplete populations that persist after RTX including those shown to produce autoantibodies. Combination strategies involving both belimumab and RTX have been successful in a phase 2a randomized clinical trial (RCT) in SLE which was followed with an ongoing phase 3 RCT, BLISS-BELIEVE (59,60). Similar investigations may be warranted in MG.
We observed that up-regulated genes associated with the subset of memory B cells most resistant to depletion (cluster 3) included those involved in tissue homing (ITGA4, S1PR1). S1P receptor is a gene that is up-regulated in lymphocytes localized to lymph nodes and inflamed tissues and target of the immunotherapy fingolimod. ITGA4 is a component of VLA-4, an integrin dimer also responsible for the localization of leukocytes to inflamed tissue and target of the biologic agent natalizumab. We speculate that this could reflect a possible tissue-associated origin for this particular B cell subset. This is consistent with evidence showing that memory B cells that resist RTX depletion can be identified in lymph nodes, spleen, bone marrow and tonsils (14,15,18,61). These findings provide a compelling argument for alternative therapies that target the tissue localization of these subsets such as by engineered T cell technology including chimeric-antigen receptor (CAR) or chimeric-autoantibody receptor (CAAR) T cell therapy (62,63).
Finally, our ability to extract biologically meaningful results from STAIR analysis suggests that tracing individual BCR sequences associated with disease relapse could provide a powerful tool for monitoring patient responses to immunotherapies that rely on the depletion of B cells and T cells. Moreover, in comparison to techniques that use bulk BCR repertoire sequencing to monitor therapy progress, this approach allows for the individual characterization of lymphocytes that resist therapy based on gene expression which may suggest alternative avenues for treatment (64).
The results from this study motivate future investigations. By combining the depth offered by bulk repertoire sequencing with the resolution of single-cell repertoire and gene expression, STAIR analysis allows for the characterization of B cells present at multiple time points in detail.
Extending these observations to studies where anti-CD20 therapy has strong evidence for efficacy, such as MS may be particularly worthwhile. These studies could test the hypothesis that the depletion of common B cell subsets may predict efficacy across diseases. Applying STAIR analysis to the simultaneous investigation of other therapies such as anti-CD19 therapy (inebilizumab)-recently shown to be efficacious for the treatment of NMO-could allow for the optimal selection of B cell depleting agents for preventing disease relapse (65). While RTX is a reliable treatment modality for MuSK MG, data for AChR MG has been mixed and patient relapses are common (8). An investigation of factors that contribute to relapse after RTX in the context of AChR MG using similar approaches could lead to the development of improved therapeutics for diagnosing and treating patients with MG regardless of disease subtype.
Several limitations should be discussed in the context of this study. Our study characterized only the single-cell gene expression of B cells after RTX depletion. Determining the phenotype of B cells that resist RTX depletion prior to therapy may be of considerable interest as a way to identify subsets prone to resisting B cell depletion prior to relapse. A recent study demonstrated that not only do B cell clones move between peripheral blood and cerebrospinal fluid over the course of immunotherapy in a subset of patients with MS, but many shift between memory and plasma cell characteristics (66). Similar studies for patients treated with RTX using single-cell transcriptomics could allow for the identification of memory B cell subsets that give rise to autoantibody-producing plasmablasts during relapse. Caveats should be noted given that our results rely on tracing B cell clones by analysis of their IGH V(D)J sequences alone. While we (and others) have demonstrated that clonal relationships can be reliably inferred given the diversity of IGH V(D)J sequences, false positives may occur (43,67,68). Additionally, owing to the abundance of transitional B cells that re-populate the repertoire during RTX relapse and our interest in studying antigen-experienced B cell subsets, we specifically excluded IgD+ expressing B cell subsets from our single-cell analysis. Nevertheless, a large fraction of B cell receptors paired with an IGHD constant region (3.2-11.4%) was found in our single-cell data. This is likely because sorting can flow-based sorting is not absolute in eliminating gated populations. Additionally, the expressed mRNA transcriptome may not always reflect the expression of surface markers used for sorting such as IgD. Given our ability to also capture the associated constant regions of reconstructed V(D)J sequences paired with single-cell gene expression, we were able to computationally filter residual IgD+ B cells (or IgD+ IgM+) by excluding B cells associated with either an IGHM or IGHD constant region.
In this study, disease-relevant B cells emerged from the failed depletion of pre-existing B cell clones. Characterization of post-RTX B cells by single-cell transcriptomics, combined with the identification of clones from paired B cell receptor repertoires identified both persistent ASC as well as memory B cell subsets. By combining the depth offered by bulk repertoire sequencing with the resolution of single-cell repertoire and gene expression, STAIR analysis allows for the characterization of B cells present at multiple times in detail. This could lead to the development and selection of therapies that target ASC and memory B cells that persist across time points, such as those associated with disease relapse.

Study Design
The protocol used was approved by the Human Investigation Committee at Yale University School of Medicine. Peripheral blood was collected from MG and HD subjects (Yale Myasthenia Gravis Clinic, New Haven, Connecticut, USA). A MuSK MG cohort (n = 3) was identified for purposes of investigation of post-RTX relapses with these characteristics that we have described previously: (i) a RTX-induced CSR/MM (complete stable remission, minimal manifestation) clinical status (>1 year), (ii) repopulation of the B cell compartment after RTX, and (iii) disease relapse following sustained CSR/MM after rituximab (20,21). The patients in the study were not selected for only those patients who had never received rituximab prior to inclusion . An additional asymptomatic (by Myasthenia Gravis Foundation of America or MGFA classification score) AChR MG control subject was also included in our study.

Bulk library Preparation
RNA was prepared from frozen peripheral blood mononuclear cells using the RNAEasy Mini kit (Qiagen) per manufacturer's instructions. Cells were pelleted and lysed in freezing media without washing to preserve cell populations sensitive to cryopreservation. RNA libraries were made using reagents supplied by New England Biolabs. The RNA was reverse-transcribed into cDNA using a biotinylated oligo dT primer. An adaptor sequence, containing a universal priming site and a 17-nucleotide unique molecular identifier (UMI) was added to the 3' end of all cDNA. Following purification using streptavidin-coated magnetic beads, PCR was performed to enrich for immunoglobulin sequences using a pool of primers targeting the IGHA, IGHD, IGHE, IGHG, and IGHM regions. This immunoglobin-specific primer pool contained tailed sequences with a priming site for a secondary PCR step. The second primer is specific to the adaptor sequence added during the RT step, and contains a sample index for downstream pooling of samples prior to sequencing. Following purification of PCR products using AMPure XP beads (Beckman), the secondary PCR was performed in order to add the full-length Illumina P5 Adaptor sequence to the end of each immunoglobin amplicon. The number of secondary PCR cycles was tailored to each sample to avoid entering plateau phase, as judged by a prior quantitative PCR analysis. Final products were purified, quantified with a TapeStation (Agilent Genomics) and pooled in equimolar proportions, followed by high-throughput 2x300 base-pair paired-end sequencing with a 20% PhiX spike on the Illumina MiSeq platform according to manufacturer's recommendations, except for performing 325 cycles for read 1 and 275 cycles for read 2.

Raw read quality control and assembly for bulk libraries
Bulk repertoire data processing and analysis was carried out using tools in the Immcantation framework (http://immcantation.org). Preprocessing was carried out using pRESTO v0.5.10 (69) as follows. 1) Reads with a mean Phred quality score below 20 were removed. 2) Reads were aligned against constant region primer and template switch sequences, with a maximum mismatch rate of 0.2 and 0.5 respectively. Reads failing to match both a constant region primer and template switch sequence were removed. 3) Remaining reads were grouped based on unique molecular identifier (UMI) sequences determined by the 17 nucleotides preceding the template switch site. 4) Separate consensus sequences for the forward and reverse reads within each UMI group were constructed with a maximum error score of 0.1 and minimum constant region primer frequency of 0.6. If multiple constant region primers were associated with a particular UMI group, the majority primer was used. 5) Forward and reverse consensus sequence pairs were aligned to each other with a minimum overlap of 8 nucleotides and a maximum mismatch rate of 0.3, and assembled into full length sequences. Sequence pairs that failed to align to each other were assembled by alignment against the IMGT human germline IGHV reference database (IMGT/GENE-DB v3.1.19; retrieved June 21, 2018; Giudicelli et al. 2005) with a minimum overlap of 0.5 and a E-value threshold of 1x10-5. 6) Isotypes were assigned by local alignment of the 3' end of each consensus sequence to isotype-specific constant region sequences with a maximum mismatch rate of 0.4. 8) Duplicate sequences were removed, except those spanning multiple biological samples and/or with different isotype assignments. Constant region primers, isotype-specific internal constant region sequences, and template switch sequences used are available at: https://bitbucket.org/kleinstein/immcantation/src/default/protocols/AbSeq.

Single-cell library preparation and gene expression analysis
B cells were isolated from PBMCs by Human Pan B cell immunomagnetic negative selection kit (StemCell) per manufacturer's instructions. Additional fluorescence-activated cell sorting (FACS) was performed for bead sorted B cells from MuSK MG post-RTX derived samples. B cells were labelled with 7AAD Viability Staining Solution (BioLegend) and 7AAD positive populations were removed. Additionally, B cells were labelled with FITC-labelled anti-IgD antibody (BioLegend clone IA6-2) per manufacturer's instructions and FITC positive populations were removed. Of note, the purpose of this additional filtering was the removal of all naive B cell populations. Sorted B cells were loaded into the Chromium Controller (10x Genomics) to form emulsion droplets. Libraries were prepared using the Chromium Single-cell 5′ Reagent Kit (10x Genomics) for version 3 chemistry per the manufacturer's protocol. Samples were sequenced on the NovaSeq with 100x100 or 150x150 paired end reads for gene expression and BCR libraries respectively. Base calls were converted to fastq sequences and demultiplexed using the cellranger mkfastq function from 10X Genomics 2.2.0 and aligned to the GCRhg38 genome supplied by 10X Genomics. Sparse count matrices, barcode assignments and feature calls were made using the cellranger count function. The resulting output was loaded into Seurat v.2.3.4 for analysis. Cells with less than 50 genes detected, or mitochondrial content above 0.2 of all transcripts were discarded. Individual cells expressing detectable CST3, CD3E, KLRB1, NKG7 and GATA2 transcripts were excluded to remove non-B cell populations like T cells and myeloid cells. Single-cell expression values were computed from log normalized count values (70). 12,534 cells were identified in the final filtered analysis. Clusters were assigned and tSNE was computed after first scaling the data, removing cells with elevated mitochondrial content, identifying highly variable genes using FindVariableGenes and correcting for batch effects using the first 20 dimensions from canonical correlation analysis as identified by visualizing with MetageneBicorPlot. The first 10 dimensions were used for PCA based on the location of the inflection point from PCElbowPlot. FindClusters was then used to identify clusters in Seurat v.2 which uses shared nearest neighbor clustering after PCA (70).
Cell subsets were identified using a "basis set" of published marker genes called immunoStates (71). The mean expression of each gene was computed for each cluster from the scaled log normalized expression values; each cluster was then assigned to the immunoState subset with the highest Pearson correlation value. The absence of transitional B cells in the immunostates dataset meant this cluster was identified manually using marker genes alone. Small clusters with elevated expression of mitochondrial associated genes were also excluded from further analysis. We confirmed our assignments by visualizing the expression of known gene expression markers for instance, CD27 (for memory B cells), CD20 (for B cells in general except plasma cells), CD10 (for transitional B cells) and BLIMP1 (for plasma cells) ( fig. S6A-C). Overall marker expression was also visualized based on the percent of cells in each cluster expressing the gene and the overall mean level of gene expression in each cluster ( fig. S6D,E). To further confirm the identities of these single-cell clusters, we assessed their paired IGH repertoire features for mean somatic hypermutation (SHM) frequency and isotype usage which were consistent with expected values ( fig. S6F,G).  (20,21). Following V(D)J annotation, non-functional sequences were removed from further analysis and functional V(D)J sequences were assigned into clonal groups using Change-O v0. 3.4 (43). Sequences were first partitioned based on common IGHV gene annotations, IGHJ gene annotations, and junction lengths. Within these groups, sequences differing from one another by a length normalized Hamming distance of 0.126 were defined as clones by single-linkage clustering. This distance threshold was determined based on the average local minima between the two modes of the within-sample bimodal distance-to-nearest histogram for the three patients across multiple time points; thresholds were determined using a kernel density estimate in the same manner described previously (43). Germline sequences were then reconstructed for each clonal cluster (VH) with D segment and N/P regions masked (replaced with Ns) using the CreateGermlines.py function within Change-O v0.3.4.

Differential gene expression analysis
Single-cell log normalized expression values were imputed using the ALRA algorithm to account for dropout during differential gene expression analysis (73). The mean value for each gene per sample was subtracted, the mean expression value for cells of a given status and sample was computed and values were z-score normalized for each patient. Student's t-tests without pairing were performed to assign p-values for hypothesis testing. False detection rate (FDR) correction was performed using Storey's method implemented within the qvalue package v3.9 in R from Bioconductor on the 12,066 p-values extracted from our analysis of cluster 3 associated genes (74). Gene ontologies were assigned using the enrichr package v1.0 in R which implements a wilcoxon signed rank test for significance (75). A q-value (corrected p-value) threshold of 0.05 was used for assigning significance for all tests of significance.

Analysis of SHM, diversity and clonal overlap from BCR repertoires
Mutations relative to the germline sequence were quantified using SHazaM v0.1.8 in R v3.4.2 (76). Diversity analysis was performed using Alakazam v0.2.11 and assessed using the generalized Hill index, with uniform downsampling (to the number of V(D)J from the sample with the fewest sequences to account for different sequencing depth) and 2,000 replicates of bootstrapping from the inferred complete clonal abundance (77). Clonal overlap was computed using a bray-curtis metric implemented by the function scipy.spatial.distance.braycurtis in scipy v1.1.0 (78).

Inference of B cell lineage trees and prediction of intermediate timepoints
B cell lineage tree topologies and branch lengths were estimated using the dnapars program distributed as part of PHYLIP (v3.697) (79). A maximum parsimony algorithm was used to label the internal nodes of each lineage tree as either or pre-RTX, post-RTX, given the date of sampling of each sequence at the tree's tips. This was accomplished using an implementation of the Sankoff parsimony algorithm (80) in which switches from post-RTX to pre-RTX are forbidden using a cost matrix that weights post-RTX to pre-RTX switches as 1000 times more costly than switches in the opposite direction. Clusters of internal nodes separated by zero length branches (polytomies), were re-ordered using nearest-neighbor interchange (NNI) moves to minimize the number of label switches along the tree. These analyses were performed using IgPhyML v1. 10 (81).
Statistical analysis R v.3.4.2 (76) and Python 3.5.4 was used for all statistical analysis. Dataframe handling and plotting was performed using functions from the tidyverse 1.2.1 in R and pandas 0.24.2, scipy 1.1.0 and matplotlib 2.2.2 in python. All parametric statistical testing aside from that used for differential gene expression analysis (described previously) was performed in R using the aov function for two-way ANOVA or t.test functions for paired two-tailed student t-tests (or onetailed student t-test when appropriate). A significance threshold of <0.05 was used and shown on plots with a single asterisk; double asterisks correspond to a p <0.01 and triple asterisks correspond to a p<0.001. Table S1. Demographic characteristics of MuSK MG study subjects at each collection time point. Table S2. Counts of reconstructed V(D)J sequences by isotype and clones from sequencing of pre-RTX and post-RTX bulk IGH repertoires. Fig. S1. Timeline of three patients who experienced post-RTX relapses. Symptom score based on MGFA class and MGC scores were recorded at all collection time points. Fig. S2. Distance-to-nearest plots used to identify a common threshold to use for hierarchical clustering based grouping of V(D)J sequences from high throughout sequencing of BCR repertoires. Fig. S3. The presence of B cell clonal expansions and somatic hypermutation cannot be detected from analysis of the bulk IGH repertoire. Table S3. Count of plasmablast derived or tetramer-binding V(D)J sequences clonally related to members of the pre-RTX or post-RTX bulk BCR repertoire (Traceable clones) or to members of the pre-RTX and post-RTX bulk BCR repertoire (Pre-RTX clones).        Data file S1. List of differentially expressed genes for cluster 3 compared to all other memory B cell clusters.   Fig. 1. Schematic showing overall workflow from clinical data elements and sample collection to computational analysis of repertoires. In step one, "Bulk" repertoires using nextgeneration sequencing by isolating unpaired IGH V(D)J sequences directly from RNA were generated. In step two, paired heavy and light chain V(D)J sequences from circulating B cells were expressed as recombinant antibodies and tested for "Specificity" for antigen. In step three, paired heavy and light chain V(D)J sequence repertoires from high-throughput emulsion-based "Single Cell" sequencing were generated. These three types of repertoires were analyzed together in our workflow, "Repertoire Analysis", for each patient across the sampled time points. This workflow involved the identification of clones, quantification of diversity, characterization of differential gene expression, and analysis of SHM frequency and isotype distribution. A one-tailed t-test was used to assess the significance of the null hypothesis that Intra-Patient overlap was not higher than Inter-Patient overlap. Overall distribution of frequencies of different isotypes (C) and paired average somatic hypermutation frequencies (D) among the set of sequences belonging to persistent and non-persistent clones during post-RTX relapse for study subjects. Two-way ANOVA was performed to assess significance for an overall somatic hypermutation difference between non-persistent compared to persistent clones across isotype. Sequences isolated from plasmablasts using Sanger sequencing-based approaches are denoted as "Disease-Relevant"; some of these bind MuSK weakly including the three shown here (black arrows). Lineage tree topology and branch lengths were estimated using maximum parsimony, with edge lengths representing the expected number of nucleotide substitutions per site (see scale) as estimated using dnapars v3.967 (79). Tips are colored by antibody isotype, and each internal branch is colored by whether its descendant node was predicted to have occurred in the pre-or post-RTX repertoire using a constrained maximum parsimony algorithm (see Methods). Simpson's diversity for persistent and non-persistent members of the ASC cluster. A one-tailed t-test was used to assess the significance of the null hypothesis that persistent clones would be less diverse than non-persistent clones. (E) The fraction of ASC cluster members is quantified as a ratio of the total fraction of cluster members combined for each patient. A one sample t-test was used to assess significance. Horizontal bars show the average frequency of ASC cluster members for each patient of a given status. Frequencies belonging to the same patient are paired with a gray line.        Table S5. Table of cluster assignments using the immunoStates basis set. Clusters were assigned to the B cell immunostate with the maximum pearson correlation coefficient when compared with the mean expression value of genes associated with each cluster. Unassigned clusters were manually annotated. One naive B cell cluster was assigned to a transitional B cell subset based on marker expression (cluster 6) and another cluster (cluster 10) was excluded owing to elevated expression of mitochondrial genes.  IGHM  IGHD  IGHG1  IGHG2  IGHG3  IGHG4  IGHA1  IGHA2 T r a n s it io   Heatmap of significantly differentially expressed genes for cluster 3 compared to other memory B cells, with a greater than >0.4 z-score difference between persistent and non-persistent labels. enrichR gene ontology analysis of either genes (B) up-regulated or (C) down-regulated among cluster 3 memory B cells compared to other memory B cells. Only differentially expressed genes with an adjusted p-value less than 0.05 with FDR correction by Storey's method were evaluated. Red bars correspond to significantly associated gene ontology assignments (p < 0.05 by wilcoxon signed rank test).  CSGALNACT2  CDC42EP3  STK38L  USP46  PRDM4  ANKRD13C  PEX11B  PLP2  ARL2BP  UBE2N  NMRK1  BBS2  CALU  C4orf48  ANKRD28  APOL3  PAK1  UBE2I  PPP2R5C  MICU2  CDCA4  ZNF92  CLIP4  EFCAB7  KLHL42  CORO1B  FAM24B  ATP2B1  DCAF15  ERO1A  PITPNC1  BLVRA  KLC2  PCBP1  FAM117A  SSBP2  ZBTB7A  FAM45A  FAM219B  HERC1  SNED1  LYPLAL1  SPIN3  MFSD10  GFM2  PPP1R18  COL4A4  CDC25B  C6orf136  ING2  WDR41  PIK3R5  PORCN  C7orf50  GSTK1  RAB33B  GFPT1  NR1D1  ZNHIT2  SLC35E3  SPATA33  RAB31  RFFL  MYO1D  STK35  CARD19  ZBTB38  TICAM1  STMN3  FAIM  CD99  EPHA4  TIAM2  S100A6  COL18A1  KLF12  CD96  AHNAK  PERP  PDE3B  PLEKHB1  PDP1  CAPN2  LGALS1  PRKAG2  C12orf75  AK8  S100A10  HOMER3 BAIAP3