Single-cell immunophenotyping of the skin lesion erythema migrans identifies IgM memory B cells

The skin lesion erythema migrans (EM) is an initial sign of the Ixodes tick–transmitted Borreliella spirochetal infection known as Lyme disease. T cells and innate immune cells have previously been shown to predominate the EM lesion and promote the reaction. Despite the established importance of B cells and antibodies in preventing infection, the role of B cells in the skin immune response to Borreliella is unknown. Here, we used single-cell RNA-Seq in conjunction with B cell receptor (BCR) sequencing to immunophenotype EM lesions and their associated B cells and BCR repertoires. We found that B cells were more abundant in EM in comparison with autologous uninvolved skin; many were clonally expanded and had circulating relatives. EM-associated B cells upregulated the expression of MHC class II genes and exhibited preferential IgM isotype usage. A subset also exhibited low levels of somatic hypermutation despite a gene expression profile consistent with memory B cells. Our study demonstrates that single-cell gene expression with paired BCR sequencing can be used to interrogate the sparse B cell populations in human skin and reveals that B cells in the skin infection site in early Lyme disease expressed a phenotype consistent with local antigen presentation and antibody production.


Introduction
Lyme disease (LD), an Ixodes tick-transmitted infection with spirochetes of the Borrelia burgdorferi (Bb) sensu lato complex (genus Borreliella) (1), is now the most common vectorborne disease in the northern hemisphere (2,3). The earliest sign of infection reported in ~70% of patients is the appearance of an erythema migrans (EM) skin lesion at the initial tick bite site (3). EM presents as an erythematous, expanding macular lesion that usually appears one to two weeks after infection, and arises from the immune response to Bb spirochetes as they disseminate outwardly from the tick bite site (3). If not controlled by the local immune response in the skin, Bb disseminates via the lymphatics and blood to cause disease in other areas of the skin and internal organs, including the heart, musculoskeletal and nervous systems (3).
Borreliella spirochetes are widely held to be extracellular pathogens for which phagocytes, antibody and complement are critical for host defense (4). Bb-specific antibodies prevent infection in both animal models and humans and contribute to a reduction in pathogen burden and resolution of disease manifestations in mouse models (4). In humans, a robust circulating plasmablast response was found to correlate with more rapid resolution of LD symptoms after antibiotic treatment for EM (5). Within the EM lesion itself, however, B cells and plasma cells are not a dominant cell type. Histologic studies of EM reveal superficial and deep perivascular mononuclear infiltrates of lymphocytes and macrophages, occasional plasma cells, and less commonly neutrophils and mast cells (6,7). By immunohistochemistry, CD3+T cells predominate the immune infiltrate in comparison to CD68+ macrophages or CD20+ B cells (6).
The prevalence of B cells in EM lesions, however, may relate to the duration of the lesion prior to sampling. In EM lesions arising from infection with Borreliella strains in Europe, the proportion of infiltrate due to B cells was found to increase with the density of the infiltrate and size of the EM lesion, which typically reflects lesion duration (8). A study using a blister technique to extract cells from EM lesions found an increased frequency of T cells in comparison to their prevalence among peripheral blood mononuclear cells (PBMCs) (9). B cells, in contrast, were found at a frequency that was considerably lower than their prevalence in PBMCs, although this could be influenced by the method for cell extraction. By flow cytometry, both CD4 and CD8 T cells appeared antigen experienced as assessed by CD45RO and CD27 expression, with enrichment of CD4 T memory and effector subsets and CD8 effector/cytolytic subsets. Blister fluid also contained elevated levels of proinflammatory cytokines IL-6, TNFa, and IFNg, as well as the anti-inflammatory cytokine IL-10, consistent with an earlier report of elevated mRNA expression of TNFa, IFNg, and IL-10 detected by in situ hybridization using cytokine-specific RNA probes (6). More recently, evaluation of the whole skin transcriptome of EM lesions identified a strong interferon signature, consistent with cell-mediated immunity coordinated by T cells and innate immune populations as the main feature of the local immune response (10). The specific contribution of B cells to the cutaneous response, however, remains an open question.
Understanding B cell responses in the EM lesion is key to identifying host factors that may be important determinants of localized versus systemic infection with this extracellular pathogen.
Here, we leveraged new advances in single-cell RNA sequencing (scRNA-seq) analysis to characterize the B cell response in the EM lesion. Using the 10X Genomics Chromium scRNAseq platform, we defined the transcriptomes of single cells from whole skin digests of EM and autologous uninvolved skin biopsies from LD patients. Single-cell transcriptomics was combined with BCR and TCR repertoire sequencing to determine the origin and trafficking of 6 these immune populations between the circulation and the skin. We found that clonally expanded B cells bearing plasma cell as well as memory B cell signatures were recruited to the site of the EM lesion, accompanied by activated T cells and myeloid subsets. EM B cells upregulated MHC class II genes compared to B cells from uninvolved skin and displayed a biased VH-gene usage compared to circulating B cells, suggesting an antigen-specific response. A disproportionate fraction of memory B cells utilized IgM receptors and displayed a low frequency of somatic hypermutation. This IgM-expressing B cell subset did not appear to be naive, but rather was characterized by a TLR activation signature. Notably, clonal relatives of these B cells were found in the circulation. Taken together, our findings support a role for B cells, including IgMexpressing memory B cells, in local antigen presentation and antibody production in the skin response to Bb infection.

Subject characteristics and sample collection
To identify and immunophenotype cell subsets in EM lesions at a single-cell level, we obtained biopsies of EM lesions from 10 subjects with early LD separated into two cohorts based on year of enrollment (Table 1). All subjects satisfied the Centers for Disease Control LD case definition; nine of the ten tested positive for Bb antibodies using the C6 peptide enzyme immunoassay (Table 1) (11)(12)(13). One subject in Cohort 1 had disseminated infection manifesting as multiple EM without clinically apparent disease in extracutaneous sites, whereas the remainder had a single EM. Biopsies of uninvolved skin were obtained from five of six subjects in Cohort 1 and two of four subjects in Cohort 2 (Table 1).

Skin cell cluster identification and preliminary characterization using single cell transcriptomics
Single-cell suspensions of fresh EM and uninvolved skin samples from Cohort 1 subjects were analyzed by 10X Genomics scRNAseq with paired transcriptomes and repertoires without selecting for specific cell populations ( Figure 1, Table S1). Single-cell transcriptomes ( Figure   S1A-C) were pooled across subjects to identify cell clusters to which identities were assigned ( Figure S2A) based on expression of key marker genes for stromal and immune cells ( Figure S3, Table S2). As expected, there was increased representation of immune cell subsets, especially lymphocytes, in the EM skin vs adjacent uninvolved skin (Figure 2A, B). On average, T cells comprised 35.6% of EM cells vs 10.5% of uninvolved skin cells (paired ratio t-test, P=0.012). B cells, which sparsely populate normal skin (14), accounted for 2.66% of EM lesion cells in comparison to 0.31% of cells from uninvolved skin samples ( Figure 2B). Myeloid cells (paired ratio t-test, P=0.019), dendritic cells (paired ratio t-test, P=0.020) and natural killer (NK) cells (paired ratio t-test, P=0.006), were also found to be enriched in the EM lesion in comparison to uninvolved skin. The infiltration of immune cells consequently decreased proportions of nonimmune cell subsets. Fibroblasts comprised 16.3% of cells in the EM lesion vs 36.3% of uninvolved skin and keratinocytes comprised 11.2% of EM cells vs 24.6% of uninvolved skin cells (Figure 2A, B).
Because estimation of cell frequencies is dependent on accuracy of cell cluster identification, we used a second method, the Seq-Well nanowell-based platform, to perform scRNA-seq of EM and uninvolved skin samples from Cohort 2 subjects (Table S3) (15). After cluster identification, similar analyses were performed quantifying the frequency of different subsets and gene expression profiles ( Figure S4A, B). The frequency of B cells was increased ( Figure S4C, D) in one of two subjects that had paired samples (4.98% vs 0.62% in EM vs uninvolved skin respectively) while no significant elevation was observed in the second. On average, EM biopsies from all four Cohort 2 subjects contained 2.0% B cells, similar to the 2.7% observed in Cohort 1.
To more precisely define the B and T cell clusters, we examined their association with BCR and TCR receptor expression by BCR and TCR repertoire sequencing using Cohort 1 samples ( Figure 2B, C). These analyses could only be performed on Cohort 1 samples as Seq-Well does not yet permit BCR and TCR sequencing. Both B cells (paired ratio t-test, P=0.021, 1.47% in EM, 0.16% in uninvolved) and T cells (paired ratio t-test, P=0.023, 32.5% in EM, 8.5% in uninvolved) identified using this new definition were statistically more abundant in the EM lesion in comparison to uninvolved skin ( Figure 2C). Differences between the frequency of B and T cells in uninvolved compared to EM skin also were significant when calculated as a ratio relative to the frequency of keratinocytes, irrespective of whether the definition of B cells and T cells was based on marker genes ( Figure 2D) or repertoire sequencing ( Figure S5). Taken together, our results show that B cells can be found in greater abundance in EM compared to uninvolved skin, along with T cells and innate immune cell subsets. T cells were the dominant lymphocyte subset, consistent with observations from earlier studies (8,9).

B cells in EM lesions upregulate MHC class II expression
To begin to phenotype B cells within the EM lesions, we investigated the genes most differentially expressed between B cells from six EM and two uninvolved skin samples by absolute difference using pseudo-bulk gene expression profiles. We were unable to perform paired testing for significance because no B cells with associated IGH were identified in the other three uninvolved skin samples. In the samples in which B cells were detected, nine MHC class II genes were expressed by B cells, four of which were in the set of top 20 upregulated genes ( Figure 3A). Over-representation analysis using the top 40 differentially expressed genes (DEGs) and the Reactome pathways database identified a significant signature for "MHC class II antigen presentation" (P=1.2e-5), along with a signature for "Interferon signaling" (P=2.1e-5) and "Interferon gamma signaling" (P=7.4e-6) ( Figure 3B) (Supplementary data file bcr_bcell.csv). Visualization of the expression of the MHC class II gene HLA-DRA confirmed its low expression in the B cell cluster in uninvolved skin ( Figure 3C) compared to its prominent up-regulation in EM ( Figure 3D).

T cells display a signature of trafficking and receiving co-stimulation
We also examined differences between T cell subsets in EM compared to uninvolved skin. Due to T cell heterogeneity, we first assigned T cells with an associated TCRB V(D)J sequence to either a CD4 or CD8 cell cluster computationally (Table S4), followed by Tregs and dividing T cell subsets using a set of lineage-defining marker genes ( Figure S6A-C, Table S5) (16). The

Memory B cells, including unmutated class switched B cells, are enriched in EM lesions.
B cells in the skin belonged to one of two clusters, either a memory or plasma cell cluster as identified by automated cell annotation (Table S6) and based on the marker genes MS4A1 (CD20) for memory B cells and PRDM1 (BLIMP-1) for plasma cells (Table S7, Figure   S10A,B). The frequencies of memory B cells and plasma cells ( Figure 4A, B), as well as CD4, CD8, Treg and dividing T cells ( Figure 4C, D), were quantified as a total fraction of the cells present in the biopsy sample. An enrichment for memory B cells (paired ratio t-test, P=0.010, Figure 4B) and all T cell subsets (paired ratio t-test for CD4, CD8, Treg and dividing T cells were P=0.019, P=0.017, P=0.014, P=0.017, respectively, Figure 4D) was observed. While the mean frequency of plasma cells was also higher in EM skin, this was not statistically significant (paired ratio t-test, P=0.126) ( Figure 4B) and appeared to be driven by a single subject due to the presence of one expanded clone; re-quantification excluding this subject confirmed no significant influx of plasma cells into the lesion (paired ratio t-test, P=0.261).
To better define the differences and respective functions between these two B cell subsets present in EM, we compared their BCR repertoire properties. We observed that memory B cells were more likely to be IgM (paired t-test, P=0.031) ( Figure 4E) and these IgM memory B cells displayed a lower frequency of somatic hypermutation (SHM) compared to IgM plasma cells (paired t-test, P=0.038) ( Figure 4F). Moreover, a sub-population of memory B cells were unmutated (defined as <1% mutation frequency in V and J segments), particularly among IgMexpressing B cells (consistently more than plasma cells, paired t-test, P=0.046) ( Figure 4G, Table   S8). A sub-population of IgG-switched plasma cells were also unmutated (5 cells across all samples or 15.8% of plasma cells, Table S9

Evidence for antigen-driven selection among B and T cells
Given that we found B cells expressing IgA-and IgG-switched BCRs in the EM infiltrate, which suggests prior antigen experience, we examined the VH gene usage in the BCR repertoire of EM lesions. VH gene usage reflects the underlying antigen specificity of B cells and differences would suggest active selection of B cells found in the tissue based on specificity. We compared the VH gene usage distribution of B cells in the EM lesion with the repertoire observed from bulk sequencing of PBMCs (Table S10) in these same subjects. We reasoned that PBMCs would be a suitable reference since the VH gene usage of the circulating B cell repertoire has been well characterized and the blood is the most proximal location from which non-resident B cells might enter the skin.
Switched B cells (IgA or IgG) in the EM lesion expressed less IGHV4-34 (paired t-test, P=0.002) ( Figure S11A) and less IGHV3 overall (paired t-test, P=0.004) ( Figure S11B). However, with the exception of IGHJ1 (paired t-test, P=0.001, usage of this J gene occurs in less than 2% of the circulating repertoire), no differences in the use of IGHJ genes were observed ( Figure S11C). To confirm that the VH gene usage differences were not due to the different technologies used to profile the skin and PBMCs (single-cell vs bulk, respectively), we verified our findings using a parallel single-cell PBMC sequencing sample from one subject with EM ( Figure S11D-F). These analyses confirmed that B cells in the EM lesion are clonally expanded with a distinct VH gene repertoire featuring decreased use of IGHV3 family genes.  (Table S11). T cells were also more clonally expanded in the EM skin. We found that the mean size of the largest CD8 T cell clone was larger in EM compared to uninvolved skin (one-tailed paired Wilcoxon-test, 20 vs 6 cells, P=0.063, Table S12), with a similar trend also observed for CD4 T cell clones (one-tailed paired Wilcoxon-test, 23 vs 9, P=0.094, Table S13). EM B cells were also expanded compared to a control blood B cell single cell repertoire available from one subject (4.9% for the largest clone in skin and 1.4% for the largest clone in blood, consistent with sizes reported in the literature) using methods that correct for different sampling depths (18). The presence of these clonal expansions is consistent with antigen-driven selection in situ.

EM-associated B cells and T cells can be traced to the circulation
To assess whether B and T cells present in EM skin were derived from a localized response in situ or active trafficking from the circulation, we examined the bulk repertoire of PBMCs (Table   S10) for clonal relatives of these EM cells in each subject. By quantifying the extent to which members of the same clonal family were found in both EM skin and blood, we found the overlap was highly significant for both B cells (paired t-test, P=0.014, Figure S13A) and T cells (P=6.1e-5, Figure S13B), suggesting a meaningful clonal connection between the two sites. EM B cells that were clonally related to circulating B cells (referred to as "mobile") had similar repertoire properties to those without such relatives (referred to as "resident"). Specifically, no significant differences between mobile and resident B cells were found for isotype usage or SHM frequency ( Figure S14A, B respectively). We noted that up to 20% of the resident IgG B cells were unmutated whereas no unmutated IgGs were found among mobile B cells, but this difference was not significant ( Figure S14C).
As the PBMC repertoire was profiled using bulk repertoire sequencing, clones were defined for B cells and T cells using only IGH or TCRB, respectively, in these analyses. While we have previously shown that IGH alone is sufficient to determine most B cell clonal relationships, up to 20% of clones defined this way may be subdivided based on light chain expression (19). To confirm the results obtained with bulk receptor sequencing, we examined one subject from whom both skin and blood B cell repertoires were obtained with paired heavy and light chain receptors. We observed that 76.5% of the clones shared between EM and the circulation and detected by examination of the heavy chain alone were preserved when the light chain was also incorporated in the clonal definition. Similar results were obtained for TCRB (85.6%). This single-cell analysis confirmed the presence of mobile B cells at the EM lesion, and that the mobile and resident B cells exhibit similar repertoire properties (Figure S14D-F).
B and T cell subsets displayed different levels of connectivity with the circulation ( Figure 5).
Compared to memory B cells, a higher frequency of plasma cell clones were mobile (paired ratio t-test P=0.007, Figure 5B). Among T cell subsets, the frequency of mobile Treg clones was less than other T cell subsets (P=0.006, ANOVA, Figure 5D). Treg cells were less easily traced to the circulation than dividing T cells and CD8 T cell clones (paired ratio t-test P=0.006, P=0.003 respectively) ( Figure 5D); CD4 T cells were also significantly less mobile compared to CD8 T cells (P=0.043) ( Figure 5D).
We next compared the gene expression pattern of mobile and resident cells in the EM lesion.

EM-associated unmutated IgM memory B cells are activated and part of diversified clones in the circulation
Given previous work showing that T-independent B cell production of antibodies can protect against Bb infection (20,21), we sought to better characterize the phenotype of the unmutated IgM memory B cell subset. We found that unmutated IgM memory B cells, defined as having <1% mutation frequency, could not be detected in uninvolved skin from any subject. To confirm that unmutated IgM memory B cells were not misclustered naïve B cells, we examined the expression of genes that may distinguish these subsets. We analyzed the entire gene expression profile of the unmutated IgM memory B cell subset (Supplementary data file unmut_igm.csv) compared with other memory B cells for genes associated with those reported previously for naïve B cells (16). We did not find any naïve B cell genes among the set of significantly differentially expressed genes. For example, among the marker genes CD27, CD23, and IL4R, only IL4R was observed to be more highly expressed among unmutated IgM memory B cells compared to other memory B cells but this was not significant after FDR correction (IL4R, paired t-test P=0.005, P=0.053 after FDR correction) ( Figure 6A TLR10 was also observed to be the most highly expressed TLR in EM B cells but not in myeloid or dendritic cells (Table S14, S15). Expression of TLR10 also correlated with expression of an interferon-induced tryptophan catabolism gene kyureninase (KYNU) (P<2.2e-16, Spearman correlation=0.49), which has been associated with immunosuppression and was previously identified as upregulated in a bulk transcriptome analysis of EM (10). Overall, the gene signatures of unmutated IgM memory B cells in the EM lesion suggested an activated phenotype and they displayed a skewed VH and JH gene usage suggesting they were selected by antigen. five of these cells (2.3% of EM IgM B cells) had clonal relatives in the circulation ( Figure S17).
Four of these five clones included IgG-switched relatives in the circulation, three of which were also unmutated. Interestingly, while the IgM memory B cells in the EM lesion were unmutated, diversifying mutations were found in clonal relatives from the circulation in three of the five clones. Thus, unmutated IgM memory B cells, which are found specifically in the EM but not uninvolved skin, appear to be activated cells that are related to expanded, class-switched and diversified B cells in the circulation.

Discussion
The EM lesion results from local skin infection with Ixodes tick-transmitted Bb spirochetes. The act of tick feeding and the associated immunomodulatory effects of tick saliva disrupt normal wound healing and creates a protective niche for Bb to establish infection in the bloodmeal host skin (22,23). This introduction of Bb spirochetes at the bite site and their subsequent expansion elicits the first clinical sign of Lyme disease, erythema migrans, that we sought to elucidate at single cell resolution.
The EM lesion has historically been characterized through histopathology and immunohistochemistry as comprising perivascular and interstitial infiltrates dominated by IFNg-producing lymphocytes, macrophages, DCs, and occasional plasma cells (3,8). Our results confirm and germline receptors that were associated with a TLR/MyD88 signature. These IgM+ B cells also belonged to clones with circulating relatives, but displayed a distinct VH gene repertoire compared to relatives in the blood, suggesting active selection for specific BCRs in situ. These results reveal that B cells recruited to the EM lesion may undergo expansion when in contact with antigens present in the skin microenvironment, which may include Bb or tick-associated antigens as well as those of the host microbiome. As Bb components are known to activate MyD88-dependent TLR signaling, these results support the hypothesis that specialized IgM memory B cells may act as early APCs, serving an important and previously uncharacterized role in the adaptive immune response against Bb that arises at its portal of entry in the skin.
The presence of resident plasma cells and CD20 + B cells in healthy skin has been observed previously although their functional roles in this tissue remain poorly understood given their extremely low frequency (14). While B cells were abundant in EM lesions in our study, we found a low frequency of B cells in uninvolved skin, consistent with these previous reports. and signatures of interferon and cytokine signaling. Memory B cells, particularly those expressing MHC class II and co-stimulatory molecules, have previously been shown to be more abundant in animal models of inflammatory disorders at barrier sites, where they exhibit APC function to T cells and mediate local antibody production (14,24,26,27). We also observed that B cells that could be traced to the circulation displayed signatures consistent with trafficking; some B cell clones could be found in both the skin and circulation, providing supporting evidence that some of these B cells are not tissue resident. Studies have also shown trafficking of memory B cells to the lung in the context of influenza pneumonia; these B cells can transition to produce IgG-switched antibodies that clear infection (28). Our group has observed the trafficking of B cells, including plasmablast and non-plasmablast subsets, in B cell responses to HSV in the female reproductive tract (29). By tracing B cells using their BCRs, this study provides rigorous human data that antigen-presenting memory B cells directly infiltrate the Lyme EM lesion. VH gene differences indicate a possible enrichment of B cells in the EM lesion with distinct specificities compared to those from the circulation. This suggests that selected B cells enter the EM lesion based on the specificities of their BCRs. Skewing away from VH3 family genes, as found in the study, has also been observed in the context of myasthenia gravis and specific tissue-like memory B cell subsets in multiple sclerosis by our group; recent studies have also observed this skewing in the context of SARS-CoV-2 infection (18,30,31). Other studies of autoimmune diseases, including those driven by interferon signaling such as systemic lupus erythematosus, have noted a similar shift in the global BCR repertoire (32)(33)(34). A recent study on circulating plasmablast responses in Lyme disease has provided parallel evidence that selection for Bb specificity can be detected at the repertoire level although the exact VH genes differentially used by these B cells was not reported (5). In this regard, a European study examining the B cell repertoire in peripheral blood of patients with EM found overlap between between IgM and IgG CDR3 sequences during acute and convalescent infection samples in comparison to healthy controls, suggesting that B cell immune responses associated with Bb infection may be detectable in the circulating repertoire (35).
Our finding that unmutated IgM-expressing memory B cells appear to be interacting with antigen suggests a potential source for T-independent IgM that may be important for early protective humoral immunity. The identity of these B cells is therefore of particular interest. There is evidence of a role for innate B cells in mice infected with Bb, but whether similar populations contribute to the human response is unknown (20). IgM B cells that are predominant in EM lesions lack the hallmark features of an innate B cell population. Perhaps the most important evidence concerning this possibility is the distinctly non-naïve gene expression profile of infiltrating unmutated IgM memory B cells and their expression of TLR gene programs. These B cells also use less JH6, which is consistent with a non-naïve origin, and display a signature of prior antigen contact suggesting an antigen-experienced phenotype. While some studies have suggested that innate-like B cell populations in humans possess a distinct non-naïve (antigenexperienced) phenotype (36,37), markers previously found to be associated with these subsets were neither detectable (CD43) nor upregulated (CD5 and NOTCH2) in the subset of unmutated IgM memory B cells in our analysis. These B cells may instead represent a potentially extranodal B cell response that may precede a more robust lymph node response, analogous to what has been reported in Bb infection in mice (38), and may be a source of IgM ASCs, a subset that has recently been observed in human skin (25). As discussed above, the observation of a TLR/MyD88 signature among this subset of B cells is consistent with literature suggesting the relevance of T-independent B cells (20,21) and the pattern recognition system, including on B cells, in optimal serological responses to Bb (39,40). Bb components that could drive Tindependent B cell responses include its abundant lipoproteins that contain the TLR2 stimulating tripalmitoyl-S-glyceryl-cysteine moiety (e.g. in vivo expressed outer surface protein C and VlsE) as well as Bb peptidoglycan (41,42).
We also showed high levels of TLR10 mRNA in B cells compared to other subsets, including plasmacytoid DCs on which it is known to be expressed at low levels (43). TLR1 and TLR10 were the most abundant TLR transcripts in B cells, whereas TLR2 and TLR4 transcripts were the most abundant in myeloid cells. TLR10 is functional in humans and possesses both activating and inhibitory signaling potential depending on the cell system examined (44,45). Influenza virus infection of primary human monocytes in vitro enhances inflammatory cytokine production whereas anti-TLR10 antibodies can suppress primary human B cell proliferation and production of cytokines and chemokines in response to BCR stimulation conditions mimicking Tindependent or T-dependent engagement (46). A modulatory role for TLR10 on B cell function in LD remains to be elucidated in the temporal evolution of the lesion, although evidence exists for its role in controlling immune responses against other pathogens (46,47). In this regard, we found a correlation between TLR10 expression and expression of KYNU, previously identified as upregulated in a bulk transcriptome analysis of EM and associated with immunosuppression (10).
Our findings related to T cells, particularly the identification of a Th1 signature and evidence of local IFNg production associated with CD8 T cell subsets clonally traced to the circulation, are consistent with previous studies implicating these subsets in the control of local infection including against Bb (3,4,39). However, we did not identify a notable role for Treg cells as they lacked a clear differential gene expression signature in EM compared to uninvolved skin. Treg T. pallidum differs from Bb in that it has a paucity of outer membrane proteins (52) and a vaccine directed at the rare outer membrane proteins has been elusive. Study of the syphilitic chancre using scRNA-seq and BCR sequencing, therefore, may provide insight into the role of B cells in the pathogenesis of syphilis at its earliest stage of infection and aid in vaccine design.
Conclusions from our study may be limited by the modest sample size (n=10) and the predominance of older female subjects. We show, however, that our transcriptomics findings are agnostic to the type of scRNA-seq performed, as two approaches (10X Genomics and Seq-Well) produced the same qualitative results in terms of differential gene expression for key cell subsets like B cells, CD4 T cells and CD8 T cells. We did not examine T cells bearing gamma delta TCR, as there was poor recovery of these receptors from repertoire sequencing using the emulsion-based platform. While comparison was made between autologous uninvolved skin biopsies with lesional skin in our studies, we cannot exclude a potential contribution of cells in the EM biopsies that may be trafficking to the site but still within the vasculature.

Subject selection
Ten adult subjects with EM and fulfilling the CDC case definition for LD (10) were recruited from study sites within New Haven County and Storrs, Connecticut (Connecticut, USA). These Cells were processed immediately for single cell library preparation as described below.
Bulk repertoire preparation, sequencing and pre-processing RNA was prepared from frozen PBMCs using the RNAEasy Mini kit (Qiagen) per the manufacturer's instructions. BCR and TCR RNA libraries were prepared using reagents from New England Biolabs as part of the NEBNext® Immune Sequencing Kit (a kind gift of Drs. Eileen Dimalanta and Chen Song, Ipswich, MA). These steps were performed as described previously with the inclusion of an additional set of TCRB-specific primers on the 3' end (53,54). Libraries were sequenced by 325 cycles for read 1 and 275 cycles for read 2 base-pair paired-end sequencing with a 20% PhiX spike on the Illumina MiSeq platform according to manufacturer's recommendations. Bulk repertoire data processing and analysis was carried out using tools from the Immcantation framework (http://immcantation.org) as described (18).
Preprocessing was carried out using pRESTO v0.5.11 and the IMGT human germline IGHV reference database (IMGT/GENE-DB v3.1.19; retrieved May 30, 2019) as described (53,54). BCR isotypes were assigned by local alignment of the 3' end of each sequence to a set of constant region sequences.

Emulsion-based single-cell library preparation and gene expression analysis
Single cell preparations were loaded into the Chromium Controller (10x Genomics) for emulsion generation and libraries were prepared using the Chromium Single-cell 5′ Reagent Kit for version 1 chemistry per the manufacturer's protocol. Libraries were sequenced on the NovaSeq 6000 with 100x100 or 150x150 paired end reads for gene expression and BCR/TCR libraries respectively. Base calls were converted to fastq sequences and demultiplexed using the cellranger v3.1.0 mkfastq function and aligned to the coding sequences of the GCRhg38 genome supplied by 10X Genomics. Sparse count matrices, barcode assignments and feature calls were made using the cellranger count subcommand.
An average of 128 million reads were sequenced and an average of around 7,000 cells were identified per sample (Table S1). The resulting output was loaded into Seurat v.3.1.5 for analysis (55). Cells with fewer than 400 or more than 4000 genes detected, or mitochondrial content above 15% of all transcripts were discarded (genes expressed in fewer than 5 cells in all samples were also excluded). A total of 65,363 cells were identified. Single-cell expression values were then scaled using log normalized count values (56). Highly variable genes were identified using FindVariableGenes. The IntegrateData function (using the first 30 dimensions, the default parameter) was used to correct for batch effects. The first 20 dimensions were then selected for clustering analysis based on the location of the inflection point from PCElbowPlot. Clusters from total cells were assigned using the FindClusters function based on a shared nearest neighbor algorithm using a resolution of 0.4 (value at which most major clusters could be resolved) (55).
Clusters were annotated based on the expression of a panel of marker genes ( Figure S3, Table   S2). This procedure was repeated for cells from the T cell and B cell clusters that were associated with a TCR or BCR sequence from single cell repertoire sequencing, respectively. We defined cells as associated with a TCR when repertoire sequencing identified an associated TCRB V(D)J sequence, and associated with a BCR when the repertoire sequencing identified both IGH and IGK/L sequences. We did not require a paired TCRA as only 60.3% of TCRB were paired with TCRA given low TCRA expression. In contrast, 96.5% of IGH were found to be paired with IGK/L. To identify clusters, a resolution of 0.7 was used for T cells, while a resolution of 0.1 was used for B cells. The Seurat RunUMAP function was run to visualize clusters.
To assign T and B cell cluster annotations, cell clusters were first annotated using a "basis set" of published marker genes called immunoStates (Table S5, S7) (16). The mean expression of each gene (raw transcript counts) was computed for each cluster; each cluster was then assigned to the immuno state subset with the highest Spearman correlation coefficient. Treg cells and dividing cell immunostates markers were absent from the immunoStates matrix. These annotations were identified manually using marker genes alone, that is, based on the fraction of each cluster expressing FOXP3 or CDK1 above a specified threshold ( Figure S6, Table S5). Assignments from the two clusters identified from B cell single cell data were assigned using the immunoStates matrix then confirmed based on the fractional expression of CD20 and BLIMP1 ( Figure S4, Table S7).  (57). Sequences were first grouped based on IGHV gene, IGHJ gene, and junction lengths. Within these groups, sequences differing by less than a length normalized Hamming distance of 0.18 were defined as clones by single-linkage clustering. This distance threshold was determined based on the average of the local minima between the two modes of within-sample bimodal distance-to-nearest histograms using a kernel density estimate as described previously (57). Germline sequences were reconstructed for each clonal cluster with D segment and N/P regions masked (with Ns) using the CreateGermlines.py function within Change-O. To identify TCR clones, clones were assigned if T cells shared the same full length aligned V(D)J sequence.

Bulk BCR and TCR repertoire V(D)J gene annotation, additional sequence filtering, clonal assignment, germline reconstruction and analysis of SHM, overlap and phylogenetics
To validate the identification of clones based on sharing between bulk and single cell repertoires, sharing between single cell repertoires was also quantified for a PBMC and EM sample from subject 5. This was done by repeating the same alignment, filtering and single linkage clustering approaches described above on these single cell only repertoires. In addition, the IGH sequences were then further grouped based on whether the cells shared any common combinations of IGKV with IGKJ and a specific junction length or IGLV with IGLJ and a specific junction length.
Mutations relative to the germline sequence were quantified using the observedMutations function from SHazaM v1.0.1 in R v3.6.1 (58). Clonal overlap was computed using a Bray-Curtis metric implemented using the scipy.spatial.distance.braycurtis function in scipy v1.1.0 (59). Clonal abundance was computed using the estimateAbundance function from Alakazam. To compute background overlap for IGH sequences to justify the presence of significant clonal overlap, clones were re-assigned across all subjects using the same clonal threshold of 0.18 described earlier or based on identical V(D)J sequences for TCRB sequences. Bray-Curtis overlap was computed for V(D)J sequences from single cell derived IGH repertoires and bulk repertoires from the same subject ("Intra-Subject) or different subjects. The average overlap for comparisons with different subjects was computed ("Inter-Subject"). A paired one-tailed t-test was performed to assess the significance of the log transformed clonal overlap relative to the background.

Differential gene expression analysis
Single-cell log normalized expression values were imputed using the ALRA algorithm to account for dropout during differential gene expression analysis (63). All differential gene expression analyses were performed using pseudobulk averages of gene expression across single cell transcriptomes; individual gene expression values were first Z-score normalized across samples from the same subject before pseudobulk averages were computed (for all differential gene expression analysis) (53,54). Student's t-tests with pairing were performed to assign pvalues for hypothesis testing. False detection rate (FDR) correction was performed using Storey's method implemented within the qvalue package v2.18.0 in R from Bioconductor after visualization of the p-value distribution (64). More than two cells must be present in each sample to calculate a pseudobulk average for that sample. A q-value (FDR corrected p-value) threshold of 0.1 was used for assigning significance for tests of significance from gene expression analysis.
EM compared to uninvolved skin B cell gene expression analysis differed from other calculations in that no paired significance testing was performed given that B cells were identified in only two uninvolved skin samples vs all six EM skin samples. Therefore, differential expression was computed without paired significance testing or normalization across samples. The minimum absolute differences between EM and uninvolved samples was computed and used to select the top and bottom genes (of note, unpaired significance testing (Student's ttest) did not yield significant genes after FDR calculation). A disproportionate fraction of the EM lesion B cell infiltrate in subject 6 was observed to be composed of a single expanded, predominantly plasma cell, IgG-switched clone causing the plasma cell fraction of this EM sample to be much higher than that of other samples; at 184 B cell members, this clone was substantially larger than the next largest clone of the dataset containing 8 B cell members, and therefore was an outlier in terms of size. This clone was excluded from gene expression analysis of this sample to maintain consistent memory B cell and plasma cell ratios across EM samples for gene expression analysis and reduce noise.
When no significant genes were identified (comparisons of Treg and CD8 T cells from uninvolved and EM skin), the top genes based on absolute difference were analyzed. Gene ontologies were assigned using the enrichr package v2.1 in R which implements a Wilcoxon test for significance using a p-value threshold of 0.05 for significance (65). A q value (corrected P value) threshold is reported in cases where this was also less than 0.05. The "Reactome_2016" gene set was used for all pathway enrichment analysis.

Code and Data Availability
Code used in this study will be made available in a publicly accessible repository at https://bitbucket.org/kleinstein/projects . Data for gene expression analysis is available in a publicly accessible repository at https://doi.org/10.5281/zenodo.4711465.

Statistics
R v3.6.1 was used for all statistical analysis. Dataframe handling and plotting was performed using functions from the tidyverse v1.3.0 in R and pandas v0.24.2 and scipy v1.1.0 in python v3.7.5. All parametric statistical testing (aside from differential gene expression analysis and gene set analysis, described above) was performed in R using the t.test functions for paired two-shown on plots with a single asterisk; double asterisks correspond to a p <0.01 and triple asterisks correspond to a p<0.001. All tests of the proportions of different cell subsets or frequencies of cells out of one were first subjected to a log transformation and are specified by text, i.e. a ratio t-test was performed. All other tests, including differences in SHM, isotype usage and gene expression, did not involve a log conversion and are specified by text.

Study Approval
This research was conducted under human research protocols approved by the Yale Institutional Review board. Written informed consent was received from all participants prior to inclusion in the study.     two B cell clusters reported as a frequency of total cells, with significance shown for a ratio t-test. One subject had a high plasma cell frequency (1.9%) largely due to the presence of an expanded clone; (C) UMAP projection of T cell subsets from either uninvolved or EM skin. Cells include those assigned to the T cell cluster with an associated TCRB receptor from repertoire sequencing. (D) Frequency of cells from the four T cell clusters as a frequency of total cells from each subject sample from either uninvolved or EM skin with significance shown as a ratio t-test. (E) Average distribution of isotypes among members of the two EM B cell clusters, expressed as a fraction of total cells from the subset. (F) Average SHM frequency of two EM B cell clusters based on isotype. (G) Overall frequency of unmutated VH gene segments among members of the two B cell cluster by isotype, expressed as a fraction of the subset with a given isotype. Horizontal bars show the mean frequency of each comparison and frequencies belonging to the same subject are connected with dashed lines. Data for the same n=5 subjects with paired Uninvolved and EM skin samples from Cohort 1 are shown for all panels, except (A) and (C) which shows cells from all subjects n=6. Statistical differences are shown only when significant for a paired (ratio) t-test (****P < 0.0001; ***P < 0.001; **P < 0.01; *P<0.05). Figure 5. B and T cell clones from the EM lesions can be traced to the circulation. (A) UMAP projection of the B cell subset overlaid with black dots denoting B cells clonally related to those present in the circulation identified by bulk IGH repertoire sequencing (unpaired). (B) The frequency of each B cell subset clonally related to those present in the circulation is designated as "Mobile." Samples with fewer than six B cells (paired with BCR receptors) were excluded from analysis. (C) UMAP projection of the T cell subset overlaid with black dots denoting T cells that are clonally related to those present in the circulation from bulk TCRB repertoire sequencing (unpaired). (D) Frequency of each T cell subset clonally related to those present in the circulation designated as "Mobile." Skin samples with fewer than six T cells were excluded from analysis. Values are quantified as a fraction of the clones from each subset. Horizontal bars show the mean frequency of each comparison and frequencies belonging to the same subject are connected with dashed lines. Data for the same n=5 subjects with paired Uninvolved and EM skin samples from Cohort 1 are shown for all panels, except (A) and (C) which shows cells from all subjects n=6. Statistical differences are shown only when significant for a paired t-test (****P < 0.0001; ***P < 0.001; **P < 0.01; *P<0.05). Horizontal bars show the mean frequency of each comparison and frequencies belonging to the same subject are connected with lines. Data for the same n=5 subjects with paired Uninvolved and EM skin samples from Cohort 1 are shown for all panels, except (A) which shows cells from all subjects n=6. Statistical differences are shown only when significant for a paired t-test (****P < 0.0001; ***P < 0.001; **P < 0.01; *P<0.0  Table 1. Demographic information at the time of initial sample collection, presence of single or multiple EM along with notation on whether or not a paired uninvolved skin sample was also collected. Cohorts 1 and 2 were recruited between the months of May and October in two different years. Subject #2 had a history of tick bite within the previous month. Subject #9 was the only subject testing negative on the C6 Lyme EIA.