Research ArticleInfectious diseaseVirology Open Access | 10.1172/jci.insight.176286
1Fred Hutchinson Cancer Center, Vaccine and Infectious Diseases Division, Seattle, Washington, USA.
2University of Washington, Department of Medicine, Seattle, Washington, USA.
Address correspondence to: Joshua T. Schiffer, Fred Hutchinson Cancer Center, 1100 Fairview Ave. N, Mail Stop E4-100, Seattle, Washington 98109, USA. Phone: 206.667.7359; Email: jschiffe@fredhutch.org.
Find articles by Owens, K. in: JCI | PubMed | Google Scholar
1Fred Hutchinson Cancer Center, Vaccine and Infectious Diseases Division, Seattle, Washington, USA.
2University of Washington, Department of Medicine, Seattle, Washington, USA.
Address correspondence to: Joshua T. Schiffer, Fred Hutchinson Cancer Center, 1100 Fairview Ave. N, Mail Stop E4-100, Seattle, Washington 98109, USA. Phone: 206.667.7359; Email: jschiffe@fredhutch.org.
Find articles by Esmaeili, S. in: JCI | PubMed | Google Scholar
1Fred Hutchinson Cancer Center, Vaccine and Infectious Diseases Division, Seattle, Washington, USA.
2University of Washington, Department of Medicine, Seattle, Washington, USA.
Address correspondence to: Joshua T. Schiffer, Fred Hutchinson Cancer Center, 1100 Fairview Ave. N, Mail Stop E4-100, Seattle, Washington 98109, USA. Phone: 206.667.7359; Email: jschiffe@fredhutch.org.
Find articles by Schiffer, J. in: JCI | PubMed | Google Scholar
Published April 4, 2024 - More info
The viral kinetics of documented SARS-CoV-2 infections exhibit a high degree of interindividual variability. We identified 6 distinct viral shedding patterns, which differed according to peak viral load, duration, expansion rate, and clearance rate, by clustering data from 768 infections in the National Basketball Association cohort. Omicron variant infections in previously vaccinated individuals generally led to lower cumulative shedding levels of SARS-CoV-2 than other scenarios. We then developed a mechanistic mathematical model that recapitulated 1,510 observed viral trajectories, including viral rebound and cases of reinfection. Lower peak viral loads were explained by a more rapid and sustained transition of susceptible cells to a refractory state during infection as well as by an earlier and more potent late, cytolytic immune response. Our results suggest that viral elimination occurs more rapidly during Omicron infection, following vaccination, and following reinfection due to enhanced innate and acquired immune responses. Because viral load has been linked with COVID-19 severity and transmission risk, our model provides a framework for understanding the wide range of observed SARS-CoV-2 infection outcomes.
COVID-19 public health emergency status has lapsed in the United States, but community levels of SARS-CoV-2 remain substantial (1). SARS-CoV-2 immunity in the population is now highly heterogeneous due to varying degrees of prior infection and vaccination (2). Also, successive circulating SARS-CoV-2 variants of concern (VOC) with different immune evasion and infectivity properties continue to emerge. This has resulted in a wider variability of viral shedding patterns than those observed during infection with the ancestral strain in the early months of 2020 (3, 4). Understanding the heterogeneous upper respiratory tract (URT) kinetics of SARS-CoV-2 enables informed design of health interventions such as testing, isolation, quarantine, and drug therapies.
Mathematical models are a vital tool for understanding mechanisms underlying observed patterns of viral expansion and clearance (5–10). To date, studies fitting SARS-CoV-2 dynamic models to viral load trajectories have estimated the timing of innate and acquired immune responses and predicted transmission parameters, including superspreader events (11–23). These models facilitated estimates of key quantities such as expected duration of the infectious period and the timing of peak viral load relative to symptom onset (21, 22, 24, 25). They also provided a theoretical means for testing treatment regimens and predicted that treatment within 5 days of symptom onset would likely be associated with higher efficacy (12, 23, 24, 26, 27), an outcome that has since been verified in multiple clinical trials (28–30). These models were also the first to suggest that viral rebound may occur in the context of early antiviral treatments (12).
However, early modeling studies only considered data from a small number of infected individuals (12, 20–27, 31–34) and often drew either entirely from previously uninfected and/or unvaccinated cohorts (14). Another consistent limitation was that most available data did not capture early time points during the presymptomatic phase of infection. Model results are, therefore, not easily generalized to current SARS-CoV-2 conditions.
The daily testing program of the National Basketball Association (NBA) occurred regardless of symptoms and identified 2,875 infections between June 2020 and January 2022, spanning the Alpha, Delta, and early Omicron VOC waves, as well as the roll-out of vaccines and boosters. Hay et al. used a statistical approach to quantify the effect of immune history and variant on SARS-CoV-2 viral kinetics and infection rebound in this data set (35). However, a more mechanistic modeling approach is required for an understanding of observed kinetic variability in this cohort.
Here, we identified 6 distinct shedding patterns in the NBA cohort data. We then compared how well candidate models that extend the classical target cell–limited model previously published by Goyal et al. (12, 23) and Ke at al. (21, 22) recapitulate the longitudinal URT viral load data from 1,510 sufficiently documented infections. After obtaining data-validated parameter estimates for each individual infection, we identified the factors underlying differing rates of viral expansion and clearance, peak viral loads, and duration of infection observed in the data. We used the model to identify differences between the timing and intensity of the immune response during initial and reinfections and identified a potential explanation for viral rebound observed in the cohort.
Viral shedding kinetics according to SARS-CoV-2 VOC. We first analyzed viral kinetics observed in the cohort according to VOC. For pre-VOC, Alpha, Delta, and Omicron variants, we observed variable kinetics among cohort participants. Median values differed between variants, with the Omicron variant having slightly lower peak viral loads and earlier clearance, while Delta had the highest peak viral loads and pre-VOC had the longest time to clearance (Figure 1, A–D). A high proportion of the infections caused by Omicron variants occurred in participants who had received either 2 or 3 vaccine doses, whereas pre-Delta infections mostly occurred in unvaccinated individuals (Figure 1E).
Viral kinetics by variant in the National Basketball Association cohort from June 2020 to January 2022. In total, 1,510 SARS-CoV-2 infections are documented from this cohort. (A–D) Time series are stratified by variant with individual viral loads plotted in color, the median viral load plotted with a solid black line, and the 25th and 75th percentiles plotted in dashed black lines for prevariant of concern viruses (A), Alpha (B), Delta (C), and Omicron infections (D). (E) Bubble plot showing the relationship between variant of infection and vaccination status of the individual. Both the color and the size of the circle indicate the number of infections in each category. (F) Additional information about infections includes age, presence of symptoms, reinfection status, and preinfection antibody titer following vaccination.
The age structure of the NBA cohort differs markedly from the general population. Of the cases documented, 46% occurred in individuals under the age of 30, 42% occurred in individuals between the ages of 30 and 50, and only 12% occurred in individuals over the age of 50 (Figure 1F). Symptom status was noted for 59% of infections, of which 71% were symptomatic (Figure 1F). The level of postvaccination, preinfection SARS-CoV-2 IgG was measured in 60% of infections. When stratifying patients into tertiles, Hay et al. (35) identified low antibody titers as less than 125 (AU/mL), midrange titers as greater than 125 AU but less than 250 AU, and high titers as greater than 250 AU, with the most infections occurring in the highest tertile (Figure 1F). In total, 17% of observed infections were reinfections of individuals followed longitudinally (Figure 1F).
Six distinct SARS-CoV-2 shedding patterns. We identified a subset of infections in the NBA cohort as “well documented” if they had at least 4 quantitative positive viral load measurements starting within 5 days of detection and if infection was documented for 3 weeks or viral elimination was confirmed with 2 sequential negative test results. This reduced the data set to 810 “well-documented” infections. To eliminate intraindividual variability from this data set, we retained 1 infection from individuals with multiple documented infections, further narrowing our focus to 768 infections. We then applied k-means clustering to the viral load data, clustering infections into 6 distinct viral shedding patterns (Figure 2, A–C) that differed according to time to viral elimination (Figure 2, C and D), area under the log10 viral load curve (log viral load AUC) (Figure 2, C and E), peak viral load (Figure 2, C and F), and time to peak (Figure 2, C and G).
Distinct viral dynamic profiles in the National Basketball Association cohort from June 2020 to January 2022. (A) Trajectories stratified by cluster assignment after k-means clustering with k = 6. Cluster centers are shown in black. (B) Heatmap of log viral load over time. Each row corresponds to an infection, and trajectories are ordered according to cluster. (C) Cluster centers plotted on the same axis demonstrate differing peak viral loads, time of viral peak, clearance rate, and time to clearance by cluster. (D) The proportion of infections cleared over time for each cluster with 95% CI shaded. (E–G) Box plots of the log10 viral load AUC (E), peak viral load for different dynamic groups (F), and days between detection and peak viral load (G). According to a Mann-Whitney U test with Bonferroni adjustment for multiple comparisons, distinctions in the mean for all possible pairs of groups are significant (P < 0.05) except for the pairs marked “ns.” (H–J) In the final row, stacked bar charts indicate the percentage of cases that fall into each dynamic group when cases are stratified by age group (H), symptom status (I), lineage of infecting variant (J), and vaccination status (K).
The first group had low peak viral loads and early median time to clearance (Figure 2, A–G). The second group had a slightly earlier and significantly higher peak than group 1 but a similarly short duration (Figure 2, A–G). The third group had a similar peak viral load compared with group 2 but with a longer time to peak viral load and later clearance (Figure 2, A–G). The fourth group had the fastest expansion rate, reaching a high, early peak viral load but maintaining similar median time to clearance as group 3 (Figure 2, A–G). The fifth group had the slowest expansion rate, taking the longest time to reach the second lowest peak viral load, and had the longest median time to clearance among the groups (Figure 2, A–G). In contrast with the prolonged low-level shedding of group 5, the sixth group had high, somewhat early peak and a long shedding duration (Figure 2, A–G).
The proportion of cases that fell into each dynamic group varied when we stratified by characteristics included in the data set. The dynamic groups with the highest AUC, groups 5 and 6, made up 39% of the infections in the 50-plus age group, whereas 21% of infections in the under-30 group were in the high AUC groups (Figure 2H). Among confirmed asymptomatic infections, 29% of cases fell into group 1, defined by low peak and early time to clearance, relative to only 14% of confirmed symptomatic cases; the slowly expanding group 3 cases were also less likely to be symptomatic, while high, early-peak group 4 cases were more often symptomatic (Figure 2I). High AUC shedding patterns were also more prevalent among infections with SARS-CoV-2 variants from earlier in the pandemic, making up 62% of pre-VOC infections, 27% of Delta infections, and only 8% of Omicron infections (Figure 2J). Among unvaccinated individuals, high AUC infection patterns were much more frequent — 63% of infections in unvaccinated individuals fell into groups 5 and 6, compared with 11% and 9% of infections for those whose most recent SARS-CoV-2 vaccine was their second dose or booster, respectively (Figure 2K).
Mathematical model fit to viral loads from 1,510 SARS-CoV-2 infections. To identify factors underlying the varied viral shedding patterns in the NBA cohort, we developed competing mechanistic mathematical models of viral and immune dynamics and selected the best model according to data-fitting criteria. The most complex model tested adapts previously published ordinary differential equation (ODE) models for within-host SARS-CoV-2 infections by combining elements introduced by Goyal et al. (12, 23) and Ke et al. (21, 22). For this model, we made mechanistic assumptions inherent to many preexisting viral dynamic models including a viral load–dependent infectivity, viral production by infected cells, a limited number of susceptible cells, and a preproduction eclipse phase for infected cells. The possible immune mechanisms included in the model were conversion of susceptible cells to an infection-refractory state dependent on the number of infected cells (presumably representing innate responses to infection), density-dependent death of infected cells as a proxy for an intensifying cytolytic innate response to a higher burden of infection, and a delayed cytolytic acquired immune response (Figure 3A).
Mechanistic mathematical model with fits to viral loads from each cluster. (A) Schematic of the ordinary differential equations model used to simulate SARS-CoV-2 infection with state variables indicated by capital letters, interactions indicated by arrows, and parameters indicated by symbols adjacent to arrows. The model contains an early and late cytolytic immune response. (B) Examples of data from individual infections and corresponding model simulations colored according to cluster identified via k-means clustering as in Figure 2, with group 1 in blue, group 2 in green, group 3 in yellow, group 4 in orange, group 5 in red, and group 6 in purple. The black examples were not included in cluster analysis. The model also captures instances of rebound or nonmonotonic clearance.
We used a nonlinear, mixed-effects framework to estimate model parameters for the 1,510 infections documented in 1,442 individuals in the NBA cohort that had at least 4 quantitative viral load measurements. We first used a representative subsample of these infections to compare model fits for the full model, illustrated in Figure 3A and written out in equation 1a–1f, and for reasonable simplifications, in which 1 or more immune mechanism was removed (Methods and Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/jci.insight.176286DS1). Under model selection criteria that balance simplicity with accuracy, the best model to explain the NBA data was the full model, except for the density-dependent death of infected cells. This model has been previously studied by Ke et al. (21). We then refit the best model to all infections. It is possible that models outside of the collection tested here could describe the data better; however, the fits that we achieve with this model were highly accurate for most members of the cohort from all 6 shedding clusters (Figure 3B and Supplemental Figure 6).
Differences in timing and intensity of immune response may explain heterogeneous shedding patterns. We next sought to explore possible virologic and immunologic explanations for different observed viral shedding patterns. For relevant model quantities, we calculated the mean within each dynamic group at each time point and a 95% CI, assuming normally distributed values. Mean viral loads projected by the model for each group (Figure 4A) resembled those from the actual data (Figure 2C). Quantitative kinetic features extracted from model simulation output including peak viral load (Supplemental Figure 1A), time to peak (Supplemental Figure 1B), viral AUC (Supplemental Figure 1C), and shedding kinetic group also agreed well with those extracted from the cohort data (Supplemental Figure 1, D and G). Projections for susceptible cells and infected cells suggest dynamics that track closely to viral load that differ accordingly among shedding subgroups (Figure 4, B and C).
Mechanistic differences between dynamic groups. (A–D) The mean viral load and cell populations in the mechanistic model for each group over time, with 95% CI shaded. The quantities are log viral load (A), number of susceptible cells (B), number of active infected cells (C), and the number of cells refractory to infection (D). Next, we plotted the mean ± SD of immune pressures over time for each dynamic group. (E) The infected cell clearance due to both constant cytolytic activity and delayed immune pressure. (F) The conversion of susceptible cells to a refractory state.
To delineate mechanistic drivers of shedding variability, we calculated the Pearson correlation coefficient between individual estimates for each model parameter and 4 viral kinetic quantities predicted by the mathematical model: log of peak viral load, time to peak viral load, shedding duration, and log viral load AUC (Supplemental Figure 2, A–D). Peak viral load correlated strongly with viral production rate, π, and had a strong inverse correlation with the rate of conversion of susceptible cells to a refractory state, ϕ (Supplemental Figure 2A). A linear model mapping log (π/ϕ) to log peak viral load predicted by the model explained a large amount of variability (Supplemental Figure 2E). The timing of peak viral load inversely correlated strongly with π, ϕ, and viral infectivity β (Supplemental Figure 2B). We fit an exponential model for time to peak viral load relative to infection as a function of log(βπ), which again explained a large amount of observed variability (Supplemental Figure 2F).
Shedding duration correlated most strongly with the time of onset of acquired immunity in the model, τ (Supplemental Figure 2C). Overall, the value of τ did not predict the time of clearance very well. This is because, for a large number of individuals particularly in groups 1–4, acquired immunity was established after the virus was already cleared (Supplemental Figure 2G). In groups 5 and 6, timing of acquired immunity onset was more predictive of shedding duration (R2 = 0.61) because acquired immunity was usually established before the virus was cleared. Numerous model parameters influenced viral AUC, though τ and ϕ were most important (Supplemental Figure 2D).
The viral shedding pattern for group 1 was notable for a low peak and early clearance of infection (Supplemental Figure 2H). These mild virologic outcomes occurred due to rapid generation of refractory cells. Early onset of acquired immune pressure was only occasionally necessary for viral clearance (Figure 4, D–F, and Supplemental Figure 2J). The higher viral peak in group 2 was driven by relatively higher values of viral production and viral infectivity and low conversion to a refractory state (Supplemental Figure 2, J–L). In group 2 infections, innate and acquired immunity both play a role in the clearance of infected cells (Figure 4E). Infections classified as group 3 were distinguished by a slower upslope, resulting from low average values of both viral production and infectivity (Supplemental Figure 2, K and L). Group 4 infections had a rapid, high peak viral load due to high viral production and infectivity, as well as relatively low average values for the conversion to refractory state. Higher values of viral production and viral infectivity, and low conversion to a refractory state, mean that these infections rapidly burn through susceptible cells and that target cell limitation slows the infection (Figure 4B and Supplemental Figure 2, J–L). Only when the viral load was already decreasing did acquired immunity typically initiate to help clear the infection (Figure 4E). Infections in group 5 had a late, low peak and a long duration. Similar to group 3, the late peak was due to low rates of viral production and low infectivity (Supplemental Figure 2, K and L). However, unlike group 3, these individuals also had low values of infected cell clearance, δ, and a very late onset of acquired immunity τ, allowing infection to persist (Supplemental Figure 2, N and O). Finally, group 6 consisted of long infections with a high peak viral load. These infections were distinguished by high viral production and infectivity (Supplemental Figure 2, K and L) and globally weak immune responses, including refractory cell conversion (Supplemental Figure 2J) and time-independent infected cell clearance rates (Supplemental Figure 2N). Thus, late-acting acquired immunity was often required to clear the infection (Figure 4E and Supplemental Figure 2, O and Q). Overall, these results suggest that a complex interplay of viral and immune features dictate how individual infections differ according to peak viral load, viral expansion rate, viral clearance rate, and duration of shedding.
We next examined correlations between estimated model parameters and found several significant patterns (Supplemental Figure 3). Viral infectivity had a strong positive correlation with viral production rate. The viral production rate also had a positive correlation with the intensity of time-independent cytolytic immune pressure. There was a strong negative correlation between the rate of reversion from refractory back to susceptible cells and time of onset of late cytolytic immune pressure. These results may suggest that viral fitness properties are related or that the durability of early innate responses is inversely correlated with the onset of delayed acquired immunity, but we cannot disentangle true biological correlations from potential identifiability challenges in the model structure.
More effective early immune responses and more rapid late responses may explain lower peak viral load and earlier clearance during reinfection with Omicron. The NBA cohort data set documented initial infection and reinfection in 67 individuals (Figure 5A and Supplemental Figure 4). Of the first infections, 52 were caused by a pre-Delta variant and 15 by Delta. For all individuals, the second infection was caused by an Omicron variant. The mean peak viral load documented in the URT for a reinfection was 0.5 log10 viral RNA copies/mL lower than the mean for first infections. Though there was a slight negative correlation between peak viral load during the first infection and that of the second infection (Figure 5B), the relationship was not statistically significant (r = – 0.18, P = 0.15). The median time to clearance for reinfections was 7.5 days after detection compared with 11 days after detection for first infections (Figure 5C). Using slightly different data from the NBA cohort, Kissler et al. found evidence that an individual’s relative clearance speed is roughly preserved across infections (36), prompting us to investigate whether this relationship, or others, appear in our model fits. We tested whether relative viral peak, time to peak, infection duration, or AUC were conserved by looking at the Pearson correlation and did not find any significant relationships (P < 0.05). We also checked whether estimated parameter values were conserved across sequential infections in the same individual and again found no significant correlations across infections in the same individual.
Mechanistic underpinning of more rapid clearance of SARS-CoV-2 during reinfection versus initial infection. Initial infection and reinfection were documented for 67 individuals in the NBA cohort. (A) Examples of data and model fits for infection and reinfection in the same individual. (B) As measured from the data, peak viral load of reinfection against peak viral load of first infection. In all cases, the variant causing the reinfection was Omicron, and the variant causing the first infection was either Delta or a pre-Delta variant. The mean peak viral load was around 0.5 log lower for second infection (t test statistic = 2.26, P = 0.0254). (C) Proportion of infections cleared for reinfection (blue) and first infections (gray) over time, as measured from the data. Median time to clearance is 7.5 versus 12 days since detection. (D) Box plots of estimated individual parameters for infection and reinfection that are significantly different between the 2 groups (P < 0.05 for Mann-Whitney U test with Bonferroni adjustment for multiple comparisons). During reinfection with Omicron, the rate that susceptible cells convert to a refractory state is higher and the onset of the late immune response occurs significantly earlier. (E–H) Mean viral load (E), number of refractory cells (F), number of susceptible cells (G), and late clearance rates over time (H) for the 2 groups as predicted by mechanistic model.
Two model parameter values were significantly different between first versus second infections (Figure 5D). Reinfections had higher values of ϕ indicating a faster conversion of susceptible cells to a refractory state. This more potent early immune response contributed to lower peak viral loads. The timing of the acquired immune response was also earlier during reinfection, suggesting more rapid activation of immune memory. We found that parameter values estimated during first infection were not predictive of parameter values estimated for a sequential infection in the same individual. Mean model projections recapitulated viral load patterns observed in the data (Figure 5E). We plotted cell populations from the model simulations for the 2 groups, and reinfection appeared to result in more refractory cells (Figure 5F) and a smaller decrease in susceptible cells (Figure 5G). The acquired immune response initiated sooner and at a higher magnitude during reinfection (Figure 5H).
Waning early immune response and strong initial clearance of infected cells as a cause of off-therapy viral rebound. Recent studies have shown that viral rebound during the natural course of untreated SARS-CoV-2 infection is relatively common, occurring in over 10% of cases by some estimates (37, 38), though rates vary according to definition. In their analysis of the NBA cohort, Hay et al. flagged 40 of 1,334 cases (3%) as rebound, defined by a nonmonotonic sequence of test results (35). As their most inclusive definition of rebound, they identified cases that achieved an initial clearance of at least 2 days with cycle threshold greater than or equal to 30, followed by at least 2 days with cycle threshold < 30.
We defined simulated infections as rebound if there were 2 or more peaks with height > 3 log10 RNA copies/mL and prominence > 0.5 log10 RNA copies/mL. We defined prominence as the height above the preceding local minima, as illustrated in Figure 6A. With these criteria, we identified 7.0% of the 1,510 cases as rebound. These cases are marked with an “R” and included first in Supplemental Figure 6. Note that we were unable to connect viral rebound to recrudescence of COVID-19 symptoms because we do not have daily reports of symptom status.
Model fitting to viral rebound in the NBA cohort. (A) We classified infections as examples of viral rebound if there were at least 2 peaks in the model simulation with height of 3 logs and prominence of 0.5 log. (B–F) Mean number of susceptible cells (B), number of active infected cells (C), number of target cells that are refractory (D), viral load (E), and rate of late clearance (F) as predicted by our mathematical model for rebound versus nonrebound cases in red and blue, respectively. The 95% CI is shaded. (G) Distribution of individual parameter estimates for the rebound versus nonrebound cases. Only those for which the mean differs significantly are displayed (P < 0.05 for Mann-Whitney U test with Bonferroni adjustment for multiple comparisons).
We observed several differences between rebound and nonrebound infection. In cases of rebound, susceptible cells were lost more rapidly initially but also replenished earlier from the refractory compartment (Figure 6B). On average, rebound trajectories had both an earlier peak in infected cells and an earlier, higher peak in refractory cells (Figure 6, C and D). The large number of refractory cells drives a more rapid replenishment of susceptible cells. The persistence of infected cells with reemergence of susceptible cells allowed for a second surge of viral production, which had been reduced by fast early clearance of infected cells (Figure 6E). The delayed onset of the late acquired immune response also allowed sufficient time for this to occur before the infection was ultimately cleared (Figure 6, E and F). Cases with rebound had higher viral production rates and higher viral infectivity, which combined to allow for growth of the viral population even with a reduced number of target cells. Rebound cases also had a higher clearance rate and a faster conversion of susceptible cells to a refractory state. Together, these forces preserve susceptible cells through the rapid initial clearance of infected cells and protection in a refractory state. Notably, rebound cases did not have a significantly higher reversion rate, to account for replenishment of susceptible cells after the first viral peak. Rather, the faster replenishment of susceptible cells occurs due to the high number of refractory cells. The viral rebound group also had a delayed onset of late immune killing, which allowed time for 2 peaks to occur before pressure from the acquired immune system cleared the infection (Figure 6G).
Viral kinetics are vital to understanding the pathogenesis of infection and, ultimately, to optimizing therapies. Here, we use a remarkable cohort from the NBA, which is unique both for its size and because it captures early presymptomatic time points during infection, to describe the increasing variability in viral load patterns observed in individuals infected with SARS-CoV-2. We observe that, with a general increase in population level immunity due to prior infection and vaccination, peak viral load is often lower and earlier with more rapid elimination of virus.
Our mathematical model identifies testable mechanistic hypotheses for these observed differences. We first predict that low peak viral loads are associated with lower viral production within infected cells and lower viral infectivity. Moreover, for viral loads that also peak early (observed in group 1), the model predicts a rapid conversion of susceptible cells to a refractory state. Both effects are compatible with data observed in animal models and in vitro models describing effects of IFN that limit the extent of viral replication and protect uninfected cells from viral entry (39–43). Appropriate followup experiments to validate this prediction would include local sampling of nasal cytokines and other mediators of local immunity during critical early time points of infection, as has been done in humans for other respiratory viral infections (44).
Different viral shedding patterns are also driven by varying balances between the magnitude of the early cytolytic immune response, which wanes as the number of infected cells and viral load declines following peak, and the late sustained immune response. In our model, we assumed that this response does not dissipate with decrease in virus, so we hypothesize that most of the late response is acquired and due to either expanding T cell or antibody levels. Prior work suggests that, during primary infection, plasma SARS-CoV-2 IgG levels rise too late to explain reduction in viral load (45). However, ref. 45 was performed in an immunologically naive cohort and needs to be reassessed in the current infection environment (46, 47). T cell–mediated killing of infected cells may also assist in elimination of infected cells during infection (46, 48). The results from our study suggest that, at the time of the NBA cohort, substantial differences in timing and intensity of acquired immune responses were still present and contributed to variability in viral kinetics.
Our results suggest that, upon reinfection, the early/innate response and the late acquired response are both more effective. The mechanisms underlying this observation are unclear. One possibility is that a higher density of tissue-resident NK cells, B cells, and T cells may exist after first infection and vaccination. In other viral infections, it has been observed that an increase in preinfection tissue-resident T cells predicts earlier initiation of a local innate and acquired response due to early antigen recognition (49, 50). These model predictions merit experimental follow-up.
Unfortunately, we are not able to link the heterogenous virologic patterns observed in the NBA cohort with severity of symptoms or future development of postacute sequelae of SARS-CoV-2 infection, as these data were not available. For multiple other viruses, viral loads have been identified as relevant correlates of disease (51–54), and late SARS-CoV-2 viral loads have been linked with severity of infection among hospitalized people (55, 56). During clinical trials, reductions in nasal viral load due to monoclonal antibodies, nirmatrelvir/ritonavir, and molnupiravir correlated with large reductions in the incidence of hospitalization and death (28, 29). However, early remdesivir, which had a large clinical benefit, was associated with no viral reduction in nasal passage several days after treatment (30), highlighting that key viral load surrogates may be in the lung rather than nasal passages or that early viral loads are more predictive of outcome (23). Because early and peak viral load measurements are so rarely obtained during COVID-19 infection, the clinical importance of these values remain unknown.
Several further limitations of this work are important to highlight. An issue that is universal to the field is that our model does not capture anatomic compartmentalization of viral shedding. Our previous model demonstrated in nonhuman primates that SARS-CoV-2 kinetics in the lung differ in subtle but important ways from those in the upper airways and that these differences are particularly important in the context of antiviral therapy (23). It is likely that our subgroups of shedding may cluster differently if we had access to serial whole lung viral loads. The reseeding of infection in the nose from the lungs or vice versa may also provide alternative explanations for the dynamics observed in this data set, particularly viral rebound. Unfortunately, such detailed studies are not available in any human cohort. Studies using saliva do suggest slightly different kinetics than those from nasal swabs (57), but it is doubtful that saliva captures total viral load in the lung.
Another issue shared by all mathematical models in the field is the lack of sufficiently granular, tissue-based immune data to precisely model the innate and acquired immune responses. Rather, our model uses several terms to capture the timing and intensity of what is likely to be a complex, multicomponent response. As with multiple other respiratory virus models and based on experimental data showing that IFN-α protects cells from infection, we assume that infection temporarily makes susceptible cells refractory to viral entry (21–23, 40, 43). Finally, we assume there is a late, sustained immune response that varies by intensity and timing, compatible with an acquired memory response.
A final limitation shared by all intrahost SARS-CoV-2 models in humans is that we are not able to measure potentially important initial conditions of infection, including viral inoculum and the number of immune cells within a relevant spatial microenvironment of infection. Thus, the model may overascribe observed differences in observed viral load trajectories to differences in immune responses rather than exposure viral load.
In summary, we identify distinct shedding patterns in adults with SARS-CoV-2 infection, with shorter, lower viral load infection more commonly observed in people with Omicron infection, prior vaccination, and recent prior infection. The mechanistic predictors of rapidly contained infection are more rapid conversion of susceptible cells to a refractory state, along with more rapid and intense late cytolytic immune responses.
Sex as a biological variable. Our study examined data from the NBA cohort that did not include the biological sex of infected individuals. Based on the demographics of the NBA and supporting staff, it is reasonable to assume that infections in males are markedly overrepresented in the data. Some studies have found that sex influences viral loads and shedding dynamics (58–60). However, others have found no influence on shedding dynamics (61, 62).
Study overview. We analyzed SARS-CoV-2 viral load data collected during untreated infections in the NBA cohort. We clustered these data into 6 dynamic groups, which were statistically different in terms of peak viral load, time to peak viral load, viral load AUC, and time to clearance. Drawing on previous models in the field, we developed a set of candidate ODE mathematical models. We then used model selection theory to determine which version the data support most strongly. With a validated model of SARS-CoV-2 infection, we examined which parameter values differ to explain the variable viral shedding patterns observed in the 6 dynamic groups. We also used this approach to explain the differing dynamics of first and second infections captured in the NBA cohort and to explain the mechanisms underlying viral rebound.
Data preprocessing. We used data from the NBA cohort previously published by Hay et al. (35). The group documented 2,875 individual SARS-CoV-2 infections in 2,678 people through frequent quantitative PCR (qPCR) testing. First, we filtered these data to include only infections with at least 4 positive quantitative samples to provide adequate viral dynamics data for model fitting. This yielded 1,510 infections in 1,442 individuals, of which 177 were caused by a pre-VOC variant, 46 by Alpha, 163 by Delta, and 1,124 by Omicron (Figure 1A). We further identified a “well-documented” subset of these infections by filtering for infections that had their first quantitative test within 5 days of detection and included test results through 20 days after detection or confirmed elimination of virus prior to day 20 (2 consecutive negative tests). This “well-documented” group consisted of 810 individual infections in 768 people. For clustering, we randomly chose 1 infection to retain from each individual with multiple documented infections, resulting in a group of 768 infections in 768 individuals. We also filtered the “well-documented” group for infections with a negative test result within 2 days prior to detection, yielding 266 cases with both early detection and 3 weeks of documentation. We refer to this subset as “fully documented.”
Quantitative features of viral dynamics. To convert cycle threshold (Ct) values to viral genome equivalents, we averaged Ct1 and Ct2 for each individual and applied equation S2 from Kissler et al. (63). Hence,
where the concentration of viral RNA is in copies/mL.
We calculated the peak viral load for a given infection as the maximum measured log10 viral load over all quantitative data points, and the time to peak viral load was the day of this measurement. We calculated the log viral load AUC from the date of detection through the last quantitative measure of viral load, linearly imputing missing values between data points. Note that this quantity is an underestimate for individuals without confirmed clearance. We calculated the median time to clearance by identifying when the cumulative incidence curve for clearance of the virus crossed 50%. The cumulative incidence curve is the inverse of the Kaplan-Meier curve for survival of the virus. The Kaplan-Meier curve (KM) and CI was computed using the Python package scikit-survival 0.21.0 (64). The cumulative incidence curve is, then, 1 – KM.
Data clustering. We clustered “well-documented” infections into 6 dynamic groups using k-means clustering as implemented in the Python package scikit-learn 1.2.2 (65). As input features, we used these 21 daily test results. These came from the day infection was detected through 20 days after detection. If any daily measurements were missing between recorded test values, we imputed the missing measurements linearly. If the last test date for an individual was prior to day 20, meaning there were missing daily measurements after the last test, we appended negative test values to reach 20 days (Supplemental Figure 5A). This occurred only for infections for which clearance was confirmed with 2 consecutive negative tests, since we clustered “well-documented” infections.
To select these hyperparameters for the k-means clustering, we tested values of k from 2 to 20 for 3 possible interpolation methods, linear, quadratic, or cubic spline, and 2 possible surveillance periods, 13 or 20 days after detection (2 or 3 weeks surveillance). Comparing these scenarios, linear interpolation up to 20 days after detection had the lowest within cluster sum of squares (Supplemental Figure 5B). Based on the location of the “elbow” in the plots, we chose to proceed with k = 6 clusters. Using k < 6 resulted in less distinctive behaviors between the groups, while using more clusters resulted in some noninterpretable cluster centers (Supplemental Figure 5, C–G).
Mathematical model of SARS-CoV-2 dynamics. We considered several ODEs models for SARS-CoV-2 infection dynamics. The full model tracks the number of target cells that are susceptible to infection (S), target cells that are refractory to infection (R), infected cells in an eclipse phase (IE), infected cells actively producing virus (IP), and SARS-CoV-2 virions (V). Susceptible cells are infected at rate βSV and become refractory at rate ϕIpS. Refractory cells revert to a susceptible state at rate ρR. When cells are first infected, they enter an eclipse phase, from which they transition to a state of producing virus at rate k. Productively infected cells are cleared at rate where the dependence on infected cells reflects an innate immune response with no memory. When the duration of infection surpasses time τ, the clearance rate of infected cells increases by mIP, capturing the delayed onset of a cytolytic acquired immune response with memory. Productively infected cells produce virus at rate π, and free virions are cleared at rate γV. Under these assumptions, the model has the form:
To ensure that the model did not predict spurious oscillations in viral dynamics, we enforced that viral production was zero when IP was less than 1. In the optimal model, the parameter h = 0 for all individuals, so the early per-cell clearance rate of infected cells is not density dependent.
As initial conditions, we set (S0, R0, IE0, IP0, V0) = (1 × 107, 0, 0, 0, V0). Previous models of SARS-CoV-2 infection in the nasal compartment have used an initial value of 1 × 107 to 1 × 108 susceptible cells, based on estimates that 2%–20% of epithelial cells in the URT display the angiotensin-converting enzyme 2 (ACE2) receptor (66, 67). We assumed that the initial number of refractory cells is zero because the early immune response is inactive prior to infection. We initiated simulations with zero infected cells, so IE0 = IP0 = 0, and with a small viral inoculum to reflect the tight bottleneck that transmission places on viral replication. The number of virions present at the outset of infection was assumed to be below the limit of detection, but the precise inoculum was initially allowed to vary for individuals. During model fitting, we estimated the onset of infection relative to detection (date of first positive test), noting this difference as t0. Among individuals in the NBA cohort for whom symptom onset was known, the mean time of symptom onset was the date of detection, so t0 is correlated with the incubation period of SARS-CoV-2. With this in mind, we restricted estimates of t0 to fall between 0 and 20 days based on a 2022 review by Wu et al., which reported that, across 142 studies of SARS-CoV-2 infection, the incubation period ranged from 1.80 to 18.87 days (68).
To maintain identifiability, we fixed 2 parameter values, setting the rate of viral production onset to be k = 4 in accordance with Ke at al. (22) and the rate of clearance of free virions to be γ = 15 in accordance with Goyal et al. (12)
Model fitting and selection. We fit the model in equation 1a–1f, as well as simpler versions that eliminate one or more immune components and/or the eclipse phase, to data from the NBA cohort using a nonlinear mixed effect approach (69). With this approach, a viral load measurement from an individual i at a time point k is modeled as
where fV represents the solution of the ODE model for the state variable describing the virus, θi is the parameter vector for individual i, and is the measurement error for the log10-transformed viral load data. Furthermore, in the population model, each individual’s parameters can be written as the sum of the average population value θpop and a random effect encompassing their deviation from the average, ηi; the parameters for individual i are given by θi = θpop + ηi. We fixed σ = 0.5 log10 viral RNA copies/mL when comparing model fits, so that any differences in likelihood of the full model occur due to a change in agreement between model simulations and data rather than a drastic increase in the estimated magnitude of the measurement error.
For model selection, we first worked with the 266 fully documented infections (early detection and at least 3 weeks of follow-up or confirmed clearance). In addition to the raw data, for individuals without confirmed elimination, we imputed 5 “assumed negative” test results at 2-day intervals starting at 40 days after detection. Out of the 1,510 infections considered in model fitting, 629 had regular measurements past 40 days and 99.5% of tests collected past day 40 were negative. For viral load observations below the lower limit of quantification or marked as “assumed negative,” we used the probabilistic model that Monolix software provides for left-censored data (70).
The candidate models that we considered are listed in Supplemental Table 1. For each candidate model, we used the Stochastic Approximation of the Expectation Maximization (SAEM) algorithm (71) embedded in the Monolix software (Monolix 2023R1, Lixoft SAS, a Simulations Plus company) to obtain the maximum likelihood estimation (MLE) of the vector of fixed effects, θpop, and the MLE of the vector of SDs of the random effects, σθ, for the model parameters β, π, ϕ, ρ, δ, h, τ, m; the delay between infection and date of detection, t0; and the initial viral inoculum V0. We assumed a log-normal distribution for parameter values and a logit-normal distribution for initial conditions. The delay between infection and detection, t0, was assumed to fall between 0 and 20 days. The viral inoculum was assumed to fall between 0 and 250 log10 viral RNA copies/mL.
We ran the SAEM algorithm 6 times for each model using randomly selected initial values for the estimated parameters. Using the parameter set with the highest likelihood, we computed the Akaike Information Criterion (AIC) for each model. Recall that AIC = −2 max(log L) + 2m, where L is the likelihood that the data were generated by this model with these parameter values and m is the number of model parameters. Smaller AIC scores indicate that a model is statistically more likely to explain the data. The model with the smallest AIC score in the initial model selection phase included an eclipse phase, a refractory cell compartment, and time-dependent clearance of infected cells but not density-dependent clearance. All AIC scores are recorded in Supplemental Table 1.
The best fit run for the optimal model estimated very little variation in the viral inoculum between individuals. The population average was V0_pop = 97.3, while the SD of the distribution of random effects was only ωV0 = 0.05, suggesting that fixing this parameter at the same value for all individuals may still allow for reasonable fits. We fixed V0 near the estimated population mean, V0 = 97 for all individuals and re-ran the SAEM algorithm. This yielded very similar fits to the best fit from Supplemental Table 1, with a slightly lower AIC score of 13,731 compared with 13,738. While we expect that the actual viral exposure initiating individual infections in the NBA cohort varied, this suggests that estimating V0, π, and t0 simulataneously for each individual does not lend additional flexibility. For further model fitting, we kept V0 fixed at 97.
To test whether variability in viral dynamics can be attributed to differences in prior exposure, age, or infecting lineage, we performed 1-way ANOVA for the random effects of each of the estimated model parameters against these covariates (implemented in monolix). In this case, the null-hypothesis is that the mean of the random effects (calculated from the individual parameters sampled from the conditional distribution) is the same for each category of the categorical covariate. Ranking all possible covariates by their P value, the most likely covariate was between the onset of acquired immunity and vaccination status. We tried adding this as a covariate to the model, which allows for a perturbation of the population mean τpop, by some value βτ _ j for each possible vaccination status j. Including this covariate improved the model fit according to AIC score, improving from 13,731 to 13,627. We next checked whether this was a meaningful addition to the model with the Wald test, which tests the null hypothesis that βτ _ j = 0 for each possible vaccination status j. Infections in unvaccinated individuals were significantly different from infections in individuals who had been boosted (βτ _0 doses = 0.73, P < 2.2 × 10–16) and the group that had no record (βτ _ no record = 0.44, P = 6.77 × 10–7). However, individuals who had received 1, 2, or 3+ vaccinations were not significantly different from each other (βτ _ 1 dose = –0.36, P = 3.6 × 10–1; βτ _ 2 dose = –0.076, P = 3.18 × 10–1). This prompted us to regroup vaccination status into a new categorical covariate, indicating unvaccinated, at least 1 dose, or no record. With this model, the onset of acquired immunity differed significantly for infections in unvaccinated individuals versus those who received at least 1 dose of the vaccine (βτ _ >1 dose = –0.8, P < 2.2 × 10–16), but the difference between infections in unvaccinated individuals and those with no record was not significant (βτ _ no record = –0.2, P = 2.03 × 10–2). Then we regrouped vaccination status into just 2 categories, the first being individuals who are unvaccinated or have no record and the second being individuals with a record of 1 or more SARS-CoV-2 vaccinations. We repeated this process of choosing 1 new covariate to add according to the lowest significant P value resulting from the 1-way ANOVA, testing its utility using the AIC and Wald test, and coarsening the categorization if indicated, until no further significant P values resulted from the 1-way ANOVA. This resulted in 3 covariates, unvaccinated/no record versus at least 1 recorded vaccination modified the onset of acquired immunity τ, and the infecting lineage being pre-VOC/Delta versus Omicron versus unknown modified both the onset of acquired immunity, τ, and the rate at which susceptible cells enter a refractory state, ϕ. Including these covariates reduced the AIC score by 149 to 13,589. The models tested along the way are reported in Supplemental Table 2.
For the best-fitting model, there were significant correlations between the random effects of model parameters β, π, and δ, as well as ρ and τ. We started with the best model from Supplemental Table 2 and allowed for linear correlations between these parameters in the final model. This further improved the AIC score by 111 points (Supplemental Table 3). The correlation structure of the final set of parameter estimates is shown in Supplemental Figure 3, and the correlations between the random effects can be found at github (https://github.com/lacyk3/SARS-CoV-2Kinetics; commit ID, 7bd6567).
Once the final model was selected, we further restricted the SD of the measurement error to σ = 0.4 log10 viral RNA copies/mL to capture examples of viral rebound in the data and ran the SAEM algorithm in Monolix to estimate parameters for all 1,510 infections. Population parameter values are included in Supplemental Table 4, and individual model fits are shown in Supplemental Figure 6. Estimated individual parameter values are accessible at https://github.com/lacyk3/SARS-CoV-2Kinetics
Statistics. When comparing quantitative features and parameter values across different groups, we used a 2-sided Mann-Whitney U test. To determine whether covariates such as age, vaccination status, or infecting variant should be considered when estimating parameters for the population model, we used 1-way ANOVA testing the null hypothesis that the mean of the random effects (calculated from the individual parameters sampled from the conditional distribution) is the same for each category of the categorical covariate. To confirm that any added covariates contributed to the model fit, we applied the Wald test to the deviation from the reference population mean estimated for each category of the covariate, β_j. The null hypothesis of the Wald test is that the value of theβ_j parameter estimated by SAEM is equal to zero; hence, the group is equal to the reference group. When comparing characteristic viral dynamic quantities across sequential infections, we used a two-tailed t test. When assessing significance of the results, we adjusted P values using the Bonferroni correction for the number of comparisons before comparing against a significance threshold of P > 0.05.
Study approval. All data analyzed in this work were deidentified prior to our use. It was previously published by Hay et al. (35) and made available on github at https://github.com/gradlab/SC2-kinetics-immune-history
Data availability. Values for all data points in graphs are reported in the Supporting Data Values file. The code for generating all analysis and figures included in this manuscript is available at https://github.com/lacyk3/SARS-CoV-2Kinetics
KO and JTS conceptualized the study and developed mathematical models. KO performed experiments and statistical analysis. KO, SE, and JTS interpreted results and wrote the manuscript.
We thank Yonatan Grad, Stephen Kissler, and the members of the NBA cohort for making these data available. We also thank Dan Reeves and Liz Duke for their helpful advice regarding figures. Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the NIH grants R01AI169427 and R01AI121129 and the award T32AI118690. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Address correspondence to: Joshua T. Schiffer, Fred Hutchinson Cancer Center, 1100 Fairview Ave. N, Mail Stop E4-100, Seattle, Washington 98109, USA. Phone: 206.667.7359; Email: jschiffe@fredhutch.org.
Conflict of interest: The authors have declared that no conflict of interest exists.
Copyright: © 2024, Owens et al. This is an open access article published under the terms of the Creative Commons Attribution 4.0 International License.
Reference information: JCI Insight. 2024;9(9):e176286.https://doi.org/10.1172/jci.insight.176286.