Background
Infections caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) emerged in December of 2019 and rapidly evolved into the global coronavirus disease 2019 (COVID-19) pandemic, which to date has led to over 600 million confirmed cases of COVID-19, including over 6.4 million deaths (covid19.who.int). Although most SARS-CoV-2 infections are mild, the infections can develop into life-threatening conditions associated with acute respiratory distress syndrome (ARDS).
COVID-19 was early on reported as a cytokine-storm mediated disease and an aberrant host response has been implicated in severe cases [
1]. This pathobiology resembles sepsis; the most recent clinical criteria (sepsis-3) defines it as life-threatening organ dysfunction caused by a dysregulated host response to infection [
2]. The underlying responses in sepsis are complex and heterogenous, involving both pro- and anti-inflammatory immune responses that may manifest as a state of hyperinflammation or immunosuppression [
3,
4]. The sepsis-associated dysregulated host response is initiated by pathogen-associated molecular patterns and damage-associated molecular patterns released by damaged host cells, resulting in a direct activation of immune and endothelial cells [
5]. The activation leads to release of inflammatory mediators that affect not only the immune system but also the central nervous system, cardiovascular system, vascular endothelium, and the immune system, resulting in acute respiratory distress syndrome, acute kidney injury, multiorgan failure, and septic shock[
6].
Early reports of COVID-19 showed elevated levels of pro-inflammatory cytokines, e.g., IL1b, TNF and IL6, especially among severe COVID-19 cases, implicating a cytokine-storm process [
7]. In addition, comprehensive proteomics analyses of plasma samples collected from COVID-19 patients confirmed the elevation of these markers as well as many others, including factors of the immune-, complement-, and the coagulation-system, which could be linked to severity and specific COVID-19 outcomes [
8‐
13]. In all studies, an aberrant inflammatory response was evident, but the specific markers implicated as predictive classifiers for severity differed to a large extent. These differences likely reflect the heterogenic nature of the COVID-19 patients with respect to comorbidities and severity of infection, as well as technical aspects such as the analytic assays used and time of sample collection.
Considering the apparent similarities between COVID-19 and sepsis regarding the role of a host-mediated pathophysiology, we set out to compare systemic host responses during the acute stages of COVID-19 and sepsis, to capture potential disease-, pathogen-, and organ-specific proteomic profiles. Using targeted proteomics (covering 290 proteins) on samples from well-defined patient cohorts, we compared plasma proteome signatures in COVID-19 and sepsis clinical endotypes. The COVID-19 cohort included patients enrolled during the first wave of the pandemic within the open resource Karolinska KI/K COVID-19 Immune Atlas effort during spring 2020 [
14]. This resource provided insight of T-, B-, Natural Killer-, Mucosal associated invariant T-, Innate lymphoid-, mononuclear phagocyte-, and granulocyte-cell immunotypes relevant for COVID-19 protective immunity and immunopathogenesis [
15‐
21]. As comparator sepsis cohorts, we included (i) patients with community acquired pneumonia (CAP) caused by influenza, (ii) CAP caused by bacterial species, (iii) non-pneumonia sepsis (NP sepsis), and (iv) septic shock. The results revealed a shared core host response to infection, as well as unique proteomics signatures related to specific microbiologic etiology and clinical endotypes. Although COVID-19 and sepsis shared a set of core proteins that were deregulated during infection, the levels of most of these inflammatory proteins were more pronounced in sepsis compared to COVID-19. The comprehensive immune atlas resource from the same patients also allowed for correlation analyses of biomarkers to specific immune cell subpopulations implicated in COVID-19 disease severity. In addition, we applied machine learning (ML) to identify potential biomarkers that could accurately discriminate COVID-19 from CAP-sepsis patients.
Methods
Patient cohorts
Plasma samples from SARS-Cov-2 infected patients with COVID-19 admitted at the intensive care or high dependency unit (n = 17, severe COVID-19) or the infectious disease clinic (n = 10, moderate COVID-19) at Karolinska University Hospital, Stockholm were collected for this study. Paired convalescent plasma samples from the COVID-19 groups were collected from 17 patients (8 and 9, from the moderate and severe COVID-19 groups, respectively) approximately 4 months after hospital discharge (median = 136 days, range = 89–153 days). Plasma samples from sepsis patients identified and enrolled in the emergency department were also included. As controls, plasma samples from age- and sex-matched SARS-CoV-2 IgG seronegative healthy volunteers (n = 16) were collected on the same days as the acute COVID-19 patients. Inclusion and exclusion criteria for patient enrollment and collected clinical information of the patients are provided below.
The COVID-19 patients were adult SARS-CoV-2 RNA positive patients who were admitted with acute illness at the Karolinska University Hospital, Stockholm, Sweden in April and May 2020, and were treated at the Infectious Diseases and Intensive Care Unit (ICU) Clinics. Patients with oxygen saturation of 90–94% and/or receiving 0.5–3 L/min of oxygen admitted to the Infectious Diseases Clinic were included to represent moderately ill COVID-19 cases. Patients treated at the ICU or high-dependency unit were included as severely ill COVID-19 cases. Exclusion criteria were age ≥ 80 years, current malignancy, or immunomodulatory treatment prior to hospitalization, to minimize immunosuppression from other causes. Corticosteroid therapy at hospital prior to sampling had been given to 2 moderate- and 12 severe- COVID-19 patients. Plasma samples from the acute phase were collected at the day of study enrollment (i.e., 5–24 days after onset of illness and 1–8 days after hospital admission).
CAP, sepsis, and septic shock patients were identified, enrolled, and sampled for plasma within 2 h after arrival at the Emergency Department of Karolinska University Hospital Huddinge in 2017–2019, as they triggered the department´s sepsis alert [
22]. The sepsis alert was triggered in patients with clinical signs of infection combined with either (i) at least one of the following: oxygen saturation < 90% despite oxygen supplementation, respiratory rate > 30 per minute, heart rate > 130 per minute, systolic blood pressure < 90 mmHg, or Glasgow Coma Scale < 8; or (ii) blood lactate > 3.2 mmol/L combined with at least one of the following: oxygen saturation < 95% on room air, respiratory rate > 25 per minute, heart rate > 110 per minute, altered mental status, or temperature > 38.5 °C or < 35 °C.
Patients with pulmonary infiltrates and Influenza virus RNA detected in respiratory tract samples, without any bacterial microorganism detected were selected as Influenza CAP patients (n = 11). Patients with pulmonary infiltrates without any virus detected, but with Streptococcus pneumoniae (n = 11), Haemophilus influenzae (n = 4), or Staphylococcus aureus (n = 2) detected in blood culture or lower respiratory tract culture, or S. pneumoniae detected in nasopharyngeal culture (n = 1), or Mycoplasma pneumoniae (n = 3) detected by specific PCR on respiratory secretions were selected as bacterial CAP patients.
Patients with sepsis with a sequential organ failure assessment (SOFA) score of ≥ 2 within 12 h from arrival at the hospital and an infectious focus not including the lungs were selected as non-pneumonia sepsis patients. In these patients, blood culture was positive for Escherichia coli in 10 patients, Staphylococcus aureus in 3 patients, Group A Streptococcus in 1 patient, Group B Streptococcus in 3 patients, and Group C Streptococcus in 1 patient.
Finally, patients with infection, total SOFA score of ≥ 2, lactate > 2, and who received vasopressors were selected as septic shock patients (n = 12). These patients had the following foci of infection: lungs (n = 3), urinary tract infection (n = 5), skin-joint infection (n = 2), abdomen (n = 1), and other (n = 1). Blood culture was positive for E. coli in 3 patients (one hade multi-bacterial growth), Streptococcus pyogenes in 2 patients, and other gram-negative bacteria in 2 patients.
In the SOFA score calculation, registered values of creatinine, bilirubin, and platelet count values from the period 7–90 days prior to admission were used as baseline values. In the total SOFA score at sampling, baseline SOFA points for pathologic values of creatinine, bilirubin, or platelet count were subtracted.
Plasma sampling
Plasma was collected in whole-blood tubes containing EDTA (ethylenediaminetetraacetic acid). The COVID-19 patients and healthy controls were sampled using ordinary EDTA tubes that were centrifuged within 2 h and plasma was aspirated and frozen in aliquots at − 80 °C. The sepsis patients were sampled using PPT plasma preparation tubes containing EDTA, that were centrifuged within 2 h and then frozen at − 80 °C. The tubes were thawed once for aspiration and refreezing into aliquots at − 80 °C.
Quantification of soluble factors
Proximity extension assays (PEA) assays (Olink AB, Uppsala, Sweden) were used for the quantification of selected soluble factors in plasma from all cohorts. All samples were measured using three biomarker panels: organ damage (v.3311), immune response (v.3203), and inflammation (v.3022), each of them targeting 92 protein analytes. Samples that had quality control (QC) warning in all three panels were excluded (1 sepsis and 1 COVID-19 patient), and protein measurements with QC warnings were assigned as missing values. Four proteins were included in two of the panels and measured twice: IL6, CCL11, IL10, and IL5; we assigned the mean of the double measurements as NPX value for the corresponding proteins. For further analyses, only analytes with less than 33% of data under the lower limit of detection (left censored) were considered, yielding a total of 193 proteins analyzed. Individual left-censored values from the analytes included in the study were imputed with the lower limit of detection value.
Protein annotation for GO (gene ontology) biological process terms was performed on the STRING platform [
23], all terms related to Innate immune-, adaptive immune-, and Inflammatory- responses were grouped together. The association to specific immune cells was determined with the Human Protein Atlas (HPA) [
24] data sets for RNA single cell type, RNA blood cell specific, and RNA blood lineage specific.
Coagulation factors were measured in plasma using three different multiplex panels, including the 6-, 4-, and 3-plex human ProcartaPlex panels (Thermofisher). The samples were measured according to the manufacturer’s instructions and the results are shown in arbitrary units (AU) indicating the log2 of the concentration in the samples in relation to reference plasma provided with the kit. In addition, Thrombomodulin and D-dimer were measured using a Multiplex Luminex® assay (R&D Systems, UK). Assays were performed according to the manufacturer’s guidelines and samples were acquired on a Luminex MAGPIX instrument using xPonent 4.0 Software (Luminex).
Data analysis
For analysis of PEA data, two-tailed student t test was used for two-group comparisons, the test was paired in the acute-convalescent comparison. Statistical analysis of non-parametric data (clinical parameters and protein concentrations from multiplex) was performed with Mann Whitney U test for two group comparisons and Kruskal Wallis coupled with Dunn’s test. Statistical comparison of categorical variables was performed with Fisher test. All p-values were adjusted for multiple-group comparisons with the false discovery rate (FDR).
To account for the effect of confounders, we built a multivariate limma linear model [
25] to adjust protein NPX values of COVID-19, CAP-Sepsis, Other Sepsis patients for age (in years), sex, Charlson comorbidity index, and corticosteroid use prior to sampling. The model included no intercept and contrasted the COVID-19 to CAP-Sepsis and COVID-19 to Other-Sepsis status separately, to derive estimated coefficients and standard errors for a corresponding comparison. Finally, moderated two-sided
t statistic and
F statistic were calculated for the comparisons based on adjusted log2-Fold change (FC) and empirical Bayes moderation of the standard errors, and the p-values corrected for multiple testing with the FDR.
Based on the protein expression of all 193 proteins, we plotted the 122 samples on a principal component analysis (PCA) plot and further clustered the data with the partitioning around medoids (PAM) algorithm. For PAM, we tested different number of potential clusters k, ranging from k = 2 to the maximum number of samples groups (k = 9), and eventually selected the number of maximum k (k = 4) where the samples had clear separation and the minimum subgroup sample size (n = 8).
All analyses were performed with R (v.4.0.3), in R Studio (v.1.3.959).Correlation analyses were performed with two-tailed non-parametric Spearman test applied on pairwise complete observations using the packages factoextra (v1.0.7), FactoMineR (v2.3), PerformanceAnalytics (v2.0.4), ggplot2 (v3.3.1), gplots (v3.0.4), pheatmap (v1.0.12), vegan (v2.5-6), corrplot (v0.84), lattice (v0.20-41) and latticeExtra (v0.6-29), stats (v4.0.1), and complexheatmap (v2.5.6). The plots were generated with ggplot2(v3.3.6). The package ggpubr (v0.4.0.999) was used for adding statistical significance stars into the plots. The heatmaps with dendrogram integrated were made with pheatmap (v1.0.12).
To build ML models that would accurately classify COVID-19 from CAP-sepsis, we partitioned the dataset (COVID-19 n = 27, CAP-sepsis n = 32) 1,000 times, with random allocation of 75% of the samples to the training & validation dataset (TrnVD) and 25% to the testing dataset (TstD), maintaining the probabilistic distribution of the two conditions in both TrnVD and TstD. On the 1,000 TrnVD, we first ran iterations of random forest (RF) models (R package caret, v6.0-90) for protein selection, using a leave-one-out cross-validation (LOOCV) strategy, with a minimum node size of 3 nodes, 1,000 trees, and Cohen’s kappa as a metric for model training. On the same 1,000 TrnVD, we then ran iterations of logistic regression with lasso regularization (LR-lasso) models (R package glmnet, v4.1-2) for protein selection, using a LOOCV strategy, alpha = 1, selecting minimal lambda for best model in an iteration. We opted for lasso regularization of LR models because it was the best performer of the three modelling approaches, the other two including ridge regression (alpha = 0) and elastic net (EN) regularization (alpha = 0.5); EN had a comparable performance but selected more proteins in the models.
For each iteration of either model on both TrnVD and TstD, we calculated performance metrics: accuracy, F1 score, sensitivity, specificity, positive predictive value, negative predictive value, and Mathew’s correlation coefficient (MCC). MCC is a more robust estimate of accuracy for unbalanced datasets, and it ranges from -1 (extremely low agreement) to 1 (perfect agreement). To be comparable to accuracy estimates, we transformed it to a normalized MCC (nMCC), with a range from 0 to 100%, following the equation: nMCC = (MCC + 1)/2 × 100%. We used the performance metrics on TstD for the final comparison between the RF and LR-lasso models. The performance metrics were plotted on an accuracy radar plot where mean values of accuracy are presented, along with the range and with 95% confidence intervals (CI) (± 1.9685 × SD).
Discussion
In this comparative targeted-proteomics study, we analyzed proteins related to immune-response dysregulation, inflammation, and organ damage in COVID-19 and different sepsis subgroups. As expected, plasma proteome disturbances in both diseases indicated a skewed immune response, hyperinflammation, and organ damage. We demonstrate that COVID-19 and sepsis share a core host response to infection, consisting of 42 plasma proteins that were differentially altered in all infected patient cohorts as compared to healthy controls. Although shared, there was a striking difference in the magnitude of response between the cohorts, with sepsis patients displaying higher levels of most proteins. Among them, several of the classical inflammatory markers, e.g., IL6, IL8, IL10, IL12B, TNF, and IFNγ had substantially higher levels in sepsis compared to COVID-19, leading to the conclusion that the inflammatory response is more pronounced in sepsis regardless of etiology or focus of infection. In addition, the plasma proteome alterations identified unique features associated with respective disease, allowing the discovery of potential plasma biomarkers for differential diagnosis of COVID-19 and CAP-sepsis, among them TRIM21, PTN and CASP8.
Very few proteins had higher levels in COVID-19 compared to sepsis in this study, among them PTN and KRT19. Higher levels of PTN were observed in both COVID-19 cohorts compared to healthy controls and other sepsis cohorts, excluding septic shock. PTN levels were not different between severe and moderate COVID-19; a finding recapitulated in our re-analysis of the data from Filbin et al
. [
8]. However, Filbin et al. reported no difference in PTN between COVID-19 patients and PCR-negative hospital controls, likely due to the heterogeneous selection of controls. Of note, PTN has been reported as a multifunctional cytokine with potential role in inflammation, leukocyte recruitment and tissue regeneration [
37]. In accordance with our findings, Filbin et al
. found higher levels of KRT19 in severe COVID-19 (grade II) as compared to moderate (grade IV) COVID-19. We further showed that KRT19 had higher levels in COVID-19 as compared to CAP-Bac and NP-Sepsis, but similar levels compared to CAP-Infl and septic shock. In line with a previous report suggesting that KRT19 is involved in ARDS-related lung epithelial damage [
38], it is tempting to assume that elevated circulating levels of KRT19 are a result of viral-elicited lung tissue injury. It was interesting to observe that both severe and moderate COVID-19 patients had elevated KRT19 even four months after acute disease, suggesting a lingering release of the protein into the bloodstream after lung damage. Similarly, hepatocyte growth factor (HGF), a protein involved in tissue regeneration after damage [
39‐
41], had higher levels in severe COVID-19 during acute phase that persisted during convalescence, indicating its release during tissue repair. This is in line with a previous report implicating HGF in COVID-19 disease severity [
42]. This dynamic of protein levels in acute and convalescent phase was specific to KRT19 and HGF, as most proteins upregulated in severe COVID-19 during acute phase normalized during convalescence. Thus, it could be relevant to prospectively monitor KRT19 and HGF in COVID-19, to evaluate their potential value in prognosis of lung impairment post-COVID-19.
We identified a set of 47 proteins dysregulated in severe COVID-19 that were also associated with clinical parameters of disease severity. Among these proteins, KRT19, TOP2B, AREG, HGF, CKAP4, ITGB6, and NCF2 had a higher expression (> twofold) in plasma samples of severe compared to moderate COVID-19 patients, whereas CLEC4C and LTA were among the few proteins that were expressed at lower levels in severe patients. We reproduced these findings in a reanalysis of a public COVID-19 PEA dataset [
8]. CLEC4C is of interest as it is a factor linked to anti-viral responses and similarly, low
clec4C expression has been linked to a particular COVID-19 severity-interaction expression quantitative trait loci [
43]. Also, LTA have been reported to be linked to anti-viral responses, i.e. interferon-stimulated genes, and COVID-19 severity [
44]. Furthermore, we demonstrated that these proteins correlated with clinical severity scores, such as total SOFA, respiratory SOFA and PaO
2/FiO
2 ratio, in COVID-19 patients, whereas no such correlation was observed in any of the sepsis cohorts, not even in CAP patients. Notably, we found a similar association between respiratory dysfunction and the coagulation response, in that several markers, i.e., vWF, Thrombomodulin, Factor XII, and Factor VIII; correlated with respiratory SOFA and PaO
2/FiO
2 ratio solely in COVID-19 patients. Taken together, these results underscore important differences in the molecular systemic responses driving the pathophysiology of COVID-19 and CAP-sepsis.
Further analysis of the 47 COVID-19 severity-associated proteins showed strong linkage to monocytes and granulocytes. Utilizing the detailed immunophenotyping published on the same patient cohort [
16,
17], allowed for correlation analyses between soluble markers and specific immune cell subpopulations. Several plasma proteins, including HGF, AREG, CKAP4, S100A12, NCF2, and ITGB6, correlated with low expression of CD86 and HLA-DR in all subpopulations of monocytes. In a report by Kvedaraite et al
. [
16], these cell populations express a myeloid-derived suppressor cells-like phenotype that were enriched in severe COVID-19 cases. Likewise, plasma proteins in severe COVID-19 were associated with reduced activation makers of different granulocyte subpopulations, including neutrophils, basophils and most notably eosinophils, which have been reported by Lourda et al
. [
17] to be elevated in severe COVID-19 cases. Taken together, these results show an association between a distinct set of plasma proteins and immature myeloid cell subpopulations reported as elevated in severe COVID-19 disease [
16,
17]. These findings are in line with reports pointing to myeloid cells, in particular their immature forms, as important contributors to cytokine-rich environment in COVID-19 [
45,
46]. Yet, it is difficult to infer whether these processes are simply conjoined or whether the plasma proteins’ dysregulation precedes the impaired innate immune cell response.
Finally, in light of the observed differences in plasma proteins between COVID-19 and CAP-sepsis, we sought to identify biomarkers with potential use in clinical practice, differentiating the two conditions with similar presentation to aid prompt diagnosis. Although clinical examination, radiological imaging, real-time polymerase-chain reaction (RT-PCR) and bacterial cultures are helpful in differentially diagnosing COVID-19 from CAP-Sepsis, the diagnosis can sometimes be challenging due to inconclusive clinical presentation and/or radiological exams [
30], false negative SARS-CoV-2 RT-PCR results [
29], or false negative bacterial cultures [
3]. This poses as a clinical differential diagnosis problem. Using machine learning, we identified a set of diagnostic plasma biomarkers (e.g., TRIM21, CASP8, PTN and CSF1) that had very high accuracy in differentiating COVID-19 from CAP, and outperformed standard laboratory parameters used in clinical practice. Although some of the models showed perfect accuracy, it is likely overestimated due to the small sample sizes of the two cohorts. One should consider that our findings might be confounded by the different stages of disease, difference in some patient characteristics, and different sampling time periods between the COVID-19 and the sepsis cohorts. The latter also introduces a potential confounder linked to treatment such as corticosteroid use. Even though corticosteroids have been shown to alter blood protein levels in COVID-19 [
47,
48] our findings of a more pronounced inflammatory response in CAP-sepsis versus COVID-19 patients were consistent even when adjusting for corticosteroid use and other confounders. Furthermore, the identified potential biomarkers for differential diagnosis in the ML models remained significant in the adjusted multivariate linear models.
One strength of our study design is the inclusion of a prospective sepsis cohort with detailed clinical information allowing for classification into specific clinical endotypes such as CAP caused by Influenza or bacterial causes. We opted for a sepsis cohort enrolled prior to pandemic onset, to ensure that these patients did not have COVID-19, with a limitation in using different sample collection tubes. Both tubes contained EDTA, but the procedure differed as the sepsis samples were collected in PPT tubes (see methods). Although we demonstrate that there are notable differences in plasma protein levels in COVID-19 compared to sepsis, and that they can serve as biomarkers with high accuracy, our findings must be validated in future studies, using larger cohorts. These studies should include cohorts balanced for different clinical characteristics and using samples collected in parallel in the clinical setting.
In this study, we demonstrate that the systemic inflammatory response is higher in sepsis patient as compared to COVID-19 patients. Similar observations have been reported previously using specific sepsis groups, such as bacterial sepsis ARDS or Influenza sepsis [
12,
49‐
52]. Here we show that this difference is observable regardless of microbiologic etiology, site of infection, or septic shock development. In severe COVID-19, immunosuppressive therapy with corticosteroids, interleukin inhibitors, and Janus kinase inhibitors have been shown to improve survival [
53‐
55]. However, corticosteroid therapy may be harmful in the subgroup of hospitalized COVID-19 patients who do not require oxygen therapy[
53,
56]. Considering our finding that the inflammatory response was more prominent in sepsis, a greater therapeutic effect of anti-inflammatory agents in sepsis could be expected. However, previous clinical trials showed modest to no clinical efficacy of corticosteroids, interleukin inhibitors, and other anti-inflammatory drugs in sepsis [
3,
57‐
59]. Recent understandings of the high heterogeneity in the sepsis cohorts, including the inter-individual difference in systemic biological host responses to infection, may explain the lack of effect at a group level, highlighting the need for personalized medicine. For example, in septic shock, corticosteroid therapy was recently found to decrease survival in a particular patient subgroup defined by a specific whole-blood transcriptomic signature [
60]. Contrary, monocytic HLA-DR-guided immunostimulatory therapy with CSF2 (Granulocyte–macrophage colony-stimulating factor) [
61] or IFNγ [
62], have shown promising results in patients with severe sepsis. Recently, IFNγ therapy was followed by clinical improvement in five critically ill COVID-19 patients with bacterial complications [
63]. This is interesting in light of the low IFNγ found in COVID-19 patients in our study, particularly in the severe group, where six patients had secondary bacterial complications. Thus, identification of circulating biomarkers reflecting endotype-specific disease traits could enable tailored immunomodulatory therapy in sepsis and perhaps also in protracted severe COVID-19.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.