Introduction
Sepsis is a major public health concern which develops an abnormal host response to an infection, and is associated with the life-threatening organ dysfunction [
1,
2]. Acute respiratory distress syndrome (ARDS), a common and fatal complication of sepsis, is characterized by the damage of alveolar-capillary membrane leading to lung edema and hypoxemia [
3]. In a large international study, approximately 75% of patients with ARDS were caused by sepsis [
4]. According to the US report, there are over 210,000 cases of sepsis-induced ARDS in the US annually [
5]. Besides, septic patients with ARDS had a higher overall disease severity, poorer recovery from lung injury and higher mortality than non-sepsis-related ALI [
6]. Despite the growing understanding of the mechanisms in sepsis-induced ARDS, we still remain incompletely understood of why only a fraction of septic patients will develop ARDS. Furthermore, ARDS will develop rapidly after initial insult, and no consensus has yet been reached regarding biomarkers that can be used to directly diagnose ARDS and assess lung injury. Thus, it is important to identify some diagnostic biomarkers for the diagnosis of ARDS.
Gene expression signatures have been an intense focus of studies in recent years. Numerous studies have indicated that gene expression signatures have great predictive value to identify septic patients with ARDS [
7]. In one study, an 8-gene signature was found to be associated with acute lung injury (ALI), which could be used to distinguish ALI patients from septic patients [
8]. Then, the expression of genes related to neutrophils was significantly increased in septic patients with ARDS rather than patients with sepsis alone [
9]. The recent study had also found the distinguishing gene expression profiles in monocytes between patients with sepsis and patients with sepsis with ARDS [
10]. Thus, the gene signatures from gene expression profiles might be a novel and accurate biomarkers to distinguish patients with ARDS. However, with a large number of gene signatures involving the pathophysiological process, identifying those relevant for diagnosis of ARDS can be computationally challenging.
Machine learning is an emerging field with huge resources to deal with large, complex and disparate data. It has progressively improved our ability to find relevant features in large and high-dimensional data from gene expression profiles [
11]. Supervised machine learning has been used successfully to develop classifiers for disease diagnosis and identify the related biomarkers on the basis of the input features [
12,
13]. However, it still lacks the research using machine learning to identify potential diagnostic biomarkers of sepsis-induced ALI. Here, we hypothesized that by integrating multiple machine learning algorithms, we could identify gene expression signatures for sepsis-induced ALI, which may serve as diagnostic tools. Moreover, the functional analysis of the diagnostic genes identified will provide insight into the pathogenesis mechanisms of ALI development and uncover druggable targets for its prevention. In this study, we systematically reviewed the available transcriptomic profiling datasets, and identified the gene biomarkers associated with the diagnosis of sepsis-induced ALI by using a consensus of four different supervised machine learning features selection techniques. Further insight into the role of biomarkers in the pathogenesis of sepsis-induced ALI and potential candidates for the therapeutic intervention were explored.
Discussion
ALI is a lethal clinical syndrome that commonly occurs in septic patients, but the pathogenesis is still unknown. The limitations of the current ALI diagnostic system hamper the capacity to early provide optimal clinical care to septic patients, as the clinical diagnosis of sepsis-induced ALI is primarily determined by PaO2/FiO2 and chest imaging, without regard to molecular biological characteristics [
32,
33]. With the development of high-throughput sequencing technology and computational biology, numerous studies have proposed the predictive gene expression signatures based on various machine learning approaches. However, two questions should be considered that why a particular method should be used and which solution is the best one. The selection of algorithms by researchers may exist in the preference and bias. Thus, in this study, we integrated the gene expression profiles and performed a consensus machine learning algorithm to generate a consensus signature with high accuracy at identifying septic patients with ALI, as candidates for further investigation. We subsequently perform the external validation to assess the feasibility of diagnostic model in different centers, and the results suggested that the selected genes had a great predictive value with AUC (0.725 and 0.833). These data indicated that selected genes by combing different methods could reveal the diagnostic signatures and insights into regulators of disease.
The study has identified five gene signatures (ARHGDIB, ALDH1A1, TREM1, TACR3 and PI3) by several supervised machine learning algorithms (Additional file
1: Figure S8). ARHGDIB, the pivotal molecular in celluar signaling, is mainly expressed in hematopoietic tissues such as B- and T-lymphocyte cell line which was initially found to be act as the inhibitor of GDP dissociation from RhoA [
34]. Previous studies had demonstrated that the upregulated ARHGDIB could promote the macrophages infiltration and increase the production of ROS by regulating the activity of NADPH oxidase in phagocytes [
35,
36], indicating that the upregulated expression of ARHGDIB might aggravate the lung injury. Moreover, ARHGDIB could also inhibit the vascular endothelial cell migration and regulates vascular tone and other vascular functions [
37]. The upregulated ARHGDIB could inhibit the expression of vascular endothelial growth factor (VEGF) which might suppress the regeneration of endothelial cells [
38]. It has been found in this research that the overexpression of ARHGDIB in sepsis-induced ALI increases the activity of immune cells, and ARHGDIB had a significant negative correlation with the regeneration of vascular endothelial cell. It indicated that ARHGDIB promoted the development of ALI by affecting immune response and regulating activity of vascular, resulting in the damage of vascular endothelial cell and lung edema. It has been reported that the key role of ALDH1A1 is the oxidation of retinaldehyde to retinoic acid, forming transcriptional regulators critical for normal cell growth and differentiation [
39]. Furthermore, the overexpression of ALDH1A1 is closely associated with system metabolism and inflammation. Studies have found that the high expression of ALDH1A1 predicts a poor prognosis because of dysregulated metabolism and inflammatory response [
40,
41]. Interestingly, ALDH1A1 is low expression in septic patients with ALI. After the low expression of ALDH1A1 in the sepsis-induced ALI, it was found that the ability of immune tolerance was decreased, and the activities of related pathways of intercellular connectivity were also decreased, indicating that the low expression of ALDH1A1 might promote the damage of alveolar-endothelium barrier. TREM1, part of the immunoglobulin superfamily, was mainly expressed in neutrophils or monocytes/macrophages, when bound to ligand, stimulating release of proinflammatory cytokines (e.g., TNF-α and IL-1β). It is reported that the TREM1 can be used as a diagnostic and prognostic biomarker for sepsis, indicating the potential diagnostic value of TREM1 [
42]. It is believed that the upregulated expression of TREM1 in response to infection will augment inflammatory response not only remove the pathogens but also aggravate the organs damage [
42‐
44]. In this study, we found that the decreased expression of TREM1 in septic patients with ALI which might impair the clearance of pathogens. Besides, TREM1 is involved in the mitochondrial metabolism and energy production [
45,
46]. The downregulating TREM1 will lead to mitochondrial metabolism disorder and reduce the energy production which affect the cell proliferation and repairment. Our research also found that the downregulating TACR3 was associated with the decreasing production of energy and enhanced oxidative stress. It is speculated that the redox imbalance and disturbed energy were induced by downregulating the expression of TACR3, leading to the development of ALI. PI3 is neutrophil serine proteinase inhibitor with a crucial role in preventing excessive tissue injury during inflammatory events. It has previously been identified as significantly downregulated in the acute stage of ARDS, in concordance with our findings [
47]. The plasma PI3 levels could be used to early diagnosis ARDS, indicating that direct analysis of ARDS patient blood may provide valuable information [
47]. Furthermore, the expression and polymorphisms in PI3 gene were significantly associated with ARDS risk which could be regarded as a prognostic marker [
48,
49]. After injury-inducing, the epithelial will be repaired by secreting extracellular matrix to restore the epithelial barrier [
50]. However, the downregulating of PI3 affected the secretion of extracellular matrix protein which might delay the tissue repair [
47].These results suggest that the dysregulated immune response and enhanced oxidative stress might be the crucial initial mechanism to damage the alveolar-endothelium barrier, leading to increased permeability to liquid and protein across the lung endothelium, which then leads to oedema in the lung interstitium. Besides, mitochondrial dysfunction and bioenergetic dysfunction also largely contribute to the progression of sepsis-associated ALI. Thus, understanding the function of diagnostic genes will help to clarify the pathogenesis of sepsis-induced ALI and proposed the targeted therapy options.
Nowadays, reorientation of drug function is the novel strategy for disease treatment. With the ARDS mechanisms continued to reveal and treatment plans continued to refine, a variety of drugs were applied to treat ALI/ARDS. In COVID-19 associated ARDS, a lot of drugs were explored to treat COVID-19 patients even they were not applied to the treatment of lung diseases before [
51,
52]. Therefore, according to this strategy, we performed targeted drug screening of diagnostic genes to propose a novel therapeutic approach for inhibiting the development of sepsis-associated ALI. As a small molecular compound, Estradiol could efficiently bind to and decrease ARHGDIB expression. Estrogen receptor are expressed in all immune cells which could regulate the cellular functions as transcriptional factor. Treatment with Estradiol will decrease the accumulation of immune cells (e.g., neutrophil and monocyte) and suppress the production of proinflammatory cytokines, which could improve the lung inflammation [
53,
54]. However, excessive intake of Estrogen will result in the side effect such as vomiting, nausea and thrombosis [
55]. Acetaminophen is one of the most popular analgesic and antipyretic agents, which showed an exceptional performance in increasing TACR3 expression. Previous studies have demonstrated that treating sepsis patients with Acetaminophen will reduce oxidative stress and inhibit the excessive innate immune response [
56,
57], which is benefit for the tissue repair. The toxicity of Acetaminophen should be noticed that the overdose of Acetaminophen will lead to acute liver failure [
58]. The herbal compounds Curcumin have been reported the beneficial effects in treating inflammatory diseases, neurological diseases, cardiovascular diseases, pulmonary disease, metabolic diseases, liver diseases, and cancers [
59]. In sepsis-induced ALI, intranasal Curcumin could significantly reduce the expression of oxidative stress marker (e.g., nitric oxide (NO) and malondialdehyde (MDA)) and inflammatory cytokines (e.g., TNF-α). Besides, Curcumin also improves the lung permeability and reduce the capillary leakage [
60]. Yuan et al. further demonstrated that curcumin exerts anti-inflammatory and anti-oxidant effects through regulation of TREM-1 gene activity, which is in line with our study [
61]. Tretinoin (vitamin A derivative) was one of the compounds with upregulation of PI3 that exhibit high affinity docking binding energy. Tretinoin is a medicine with anti-inflammatory and immunomodulating properties for sepsis. Treatment with Tretinoin in sepsis will inhibit the activation of NF-κB and related target genes such as IL-6, MCP-1 and COX-2 [
62]. Furthermore, Tretinoin also attenuated the fibroblast degradation of extracellular matrix, suggesting that Tretinoin could modify tissue injury and ameliorate the lung fibrosis [
63]. Therefore, the interaction between Tretinoin and PI3 might improve the lung inflammation and fibrosis. Dexamethasone has been recognized as one of the most efficient anti-inflammatory medicines which was used in various inflammatory diseases. Early administration of Dexamethasone could reduce the overall mortality in ARDS patients [
64]. Paradoxically, these hormones were given to patients with sepsis and pneumonia could not find the beneficial therapeutic efficacy [
65,
66]. In our study, we found that Dexamethasone could increase the expression of ALDH1A1 in septic patients with ALI, which might prevent the lung inflammation and improve lung permeability. However, when administered through a systemic route, Dexamethasone can elicit severe side effects, such as hyperglycemia, hypertension, hydro-electrolytic disorders and peptic ulcers [
67]. Thus, based on the drugs screening for targeting the five diagnostic genes, our study has proposed a novel targeted therapy strategy with a combination of multiple drugs, which might prevent the development of sepsis-induced ALI brought by the five diagnostic genes and improve the prognosis of patients. However, of the gene-targeted drugs selected in this study, the primary goal is regulating the mRNA expression of targeted genes. Further research is needed to explore the novel biomaterials to deliver drugs to targeted genes.
The novelty of this study lies in the integration of multiple machine learning algorithms to construct a consensus model for distinguishing septic patients with ALI or not. We firstly used the correlation matrix to eliminate the multicollinearity and performed multiple supervised machine learning approaches for constructing diagnostic model. Then, we further used the external datasets to validate the accuracy in diagnostic model. Further investigation discussing gene function and targeted drugs is also novel in this research. However, there still have some limitations in this study. Firstly, although we have performed a batch correction for the several datasets, the essential bath effect still exists. Future integration studies could begin with sequenced documents to ensure consistency and accuracy. Second, many genes were excluded during the merging of datasets and eliminating multicollinearity, resulting in the loss of some important genes. However, to validate the model in independent datasets, we must ensure that genes used for model construction were available in testing sets. Third, some clinical and molecular traits were not adequately provided in public datasets, which limited the study to further reveal the potential associations between diagnostic genes and some traits. Finally, while our study provides a framework for the early diagnosis through the assessment of specific genes, the results are still in the analytical and speculative stage without experiments validation, and we recognize that the process of assessing these diagnostic genes by microarray may be time-consuming. However, utilizing real-time PCR to assess the expression of these 5 genes offers as a quick and relatively straightforward method for early recognition of sepsis-associated ALI. Thus, nanogram of five genes measured by real-time PCR may represent a promising step towards meeting the urgent diagnostic needs in the context of rapidly progressing conditions like sepsis associated ALI. Future research may further refine this method and explore its integration with clinical practice to enhance its usability and effectiveness. Besides, the combined therapeutic value of the five targeted drugs at cellular and animal level will also need to further study. Based on the diagnostic model, we hope to establish a shared platform to aid in clinical diagnosis and treatment in sepsis-induced ALI.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.