Background
Colorectal cancer (CRC) ranks as the third most common malignancy around the world with high incidence and high mortality [
1]. The current treatments for CRC mainly include surgery, chemotherapy, radiation therapy, and other targeted therapies. However, the prognosis of CRC has never been satisfying, especially for patients with advanced stage [
2]. Genetic and epigenetic changes are closely related to the occurrence of tumors and the prognosis of CRC [
3,
4]. Therefore, understanding the molecular mechanism of CRC is essential for developing preventive and treatment strategies of CRC.
Accumulating evidence confirmed the importance of the tumor microenvironment (TME) in cancer development. The complex interaction between tumor cells and TME contributes to tumor progression and impacts the patient’s clinical outcome [
5‐
7]. In this context, tumor-infiltrating immune cells and stromal cells, which are the main components of TME, have received extensive attention. Previous studies have demonstrated that stromal cells promote tumor angiogenesis and extracellular matrix remodeling during tumor progression [
8]. Meanwhile, numerous studies revealed that immune cells in the TME play important roles in tumorigenesis and are attractive therapeutic targets[
9‐
11]. Furthermore, it has been reported that infiltrating immune cells are involved in metastasis [
12]. In these biological processes, immune and stromal genes may influence the prognosis of cancer patients by regulating the abundance and function of immune cells and stromal cells. However, there is a lack of effective biomarkers related to immune and stromal scores that can predict CRC patients’ prognosis. Therefore, identifying these biomarkers is conducive to cancer diagnosis and explore new molecular targeted therapies.
With the emergence of high-throughput detection technology and the development of bioinformatics, transcriptome analysis has been widely used to assess the abundance of different types of cells in the TME. In this study, we applied the ESTIMATE scoring method to calculate the immune score and stromal score of each CRC sample from The Cancer Genome Atlas (TCGA) database. SPOCK1 and POSTN were identified as prognostic key genes associated with immune score and stromal score. The relationships between their expression with the clinical characteristics and prognosis of colorectal cancer were validated by kinds of analyses. CIBERSORT algorithm and Tumor Immune Estimation Resource (TIMER) were used to elucidate the association of SPOCK1 and POSTN expression with immune cell infiltration level. The high expression levels of SPOK1 and POSTN, as well as the positive expressions of CD68 and CD206 in several pairs of human CRC specimens, were detected by immunohistochemistry and immunofluorescence.
In conclusion, our finding indicated that high expression of SPOCK1 and POSTN predicted poor prognosis in CRC. Meanwhile, SPOCK1 and POSTN were enriched in a variety of immune-related pathways and were related to the regulation of tumor immune cells and the expression of multiple immune checkpoint genes. In addition, SPOK1 and POSTN were significantly higher expressed in cancer tissues compared to adjacent colon cancer tissues, as well as CD68(co-expressed by M1 and M2 macrophages) and CD206 (M2-specific macrophage expression), suggesting that they might be valuable prognostic biomarkers and targets for immunotherapy.
Materials and methods
Specimen collection
A total of 8 pairs of primary tumor tissues and corresponding normal tissues were collected from CRC patients who received surgical treatment at Zhongnan Hospital of Wuhan University (Wuhan, China). All patients were diagnosed by original histopathological detection and none of them received preoperative adjuvant chemotherapy or radiotherapy. The patients with non-curative resection, cancer recurrence, severe injury of vital organs, or a history of autoimmune diseases were excluded. The detailed patient characteristics are displayed in Additional file
4: Table S1. Samples of the collected tissues were preserved in liquid nitrogen. The collection of clinical specimens was approved by the clinical research institution review committee and ethics review committee of the Zhongnan Hospital (Number: 2020110), verbal informed consent was obtained from the patients for their anonymized information to be published in this article.
Data collection and processing
RNA-seq data in FPKM format and corresponding clinical information of CRC have been downloaded from The Cancer Genome Atlas (TCGA) database (
https://portal.gdc.cancer.gov/). There were 556 CRC samples with gene expression data in the TCGA. All samples were included in gene expression analysis. After excluding 45 patients with incomplete information, a total of 511 patients with complete clinical information such as survival time, survival status, age, gender, stage, T/M/N stage were included in survival analysis and clinical correlation analysis. The detailed patient characteristics of TCGA are displayed in Additional file
5: Table S2. Moreover, as a validation dataset, the normalized gene expression profile of GSE17536 was downloaded from Gene Expression Omnibus (GEO) database (
https://www.ncbi.nlm.nih.gov/geo/). This dataset containing 177 CRC samples with complete clinical information.
Differentially expressed genes (DEGs) screening
The immune and stromal scores of each CRC sample were calculated based on the ESTIMATE algorithm, which was performed by R package “estimate” [
13]. The samples were divided into high-score and low-score groups respectively according to the median of the immune and stromal scores. Differentiation analysis of the gene expression was performed by using the R package “limma”, and DEGs were screened from the comparison between the high-score group and the low-score group. |log2 fold change (FC) | ≥ 1.0 and false discovery rate (FDR) < 0.05 were set as the threshold for filtering DEGs. The intersection of the up-regulated and down-regulated DEGs in the immune and stromal score groups was identified by the online Venn diagrams tool (Venny 2.1,
https://bioinfogp.cnb.csic.es/tools/venny/).
Functional enrichment analysis of the DEGs
Gene Ontology (GO) and Kyoto encyclopedia of Genes and Genomes (KEGG) analyses (
www.kegg.jp/kegg/kegg1.html ) were performed for intersecting DEGs through “clusterProfiler”, “enrichplot”, “ggplot2” in R software. The terms with
p value < 0.05 were considered statistically significant.
Construction of weighted co-expression network and identification of clinical significant modules
Weighted gene co-expression network analysis (WGCNA) was performed to identify hub gene modules associated with clinical. The expression profile data and phenotype data matrix of 1318 DEGs were selected for WGCNA [
14]. R package “WGCNA” was performed, and the scale-free topology fit index for several powers and mean connectivity were calculated to assess the soft-thresholding power for the network construction. The soft threshold value was determined when the scale independence value was 0.9. The adjacency was turned into a topological overlap matrix (TOM), which could measure the network connectivity of genes. Based on the TOM dissimilarity with a minimum gene group size of 30 for the gene dendrogram, hierarchical clustering was carried out to classified similar genes into modules. The merge cut height was set as 0.25 to merge some modules founded on the dissimilarity of module eigengenes.
We calculated the correlation between co-expression modules and clinical features to identify clinically relevant modules. For intramodular analysis, Module membership (MM) and gene significance (GS) were calculated. Here we selected the significant module with a threshold of p value < 0.05.
Key genes screening and validation
Genes with higher MM and GS in the selected module would be defined as hub genes, and the ranked top 30 were selected. Second, prognostic genes were obtained from DEGs of the intersection by univariate Cox regression analysis with
p < 0.01 as the cutoff value. Third, Venn analysis was conducted to select the intersection of genes in the above two steps, and the genes were identified as key prognostic genes for subsequent analysis. Survival analyses (overall survival (OS) and disease-free survival (DFS)) of key genes were carried out based on TCGA and the Genotype-Tissue Expression (GTEx) data in the Gene Expression Profiling Interactive Analysis (GEPIA) database (
http://gepia.cancer-pku.cn/). For further validation, Kaplan-Meier curves based on GSE17536 were generated using the R package “survival”.
p < 0.05 was considered significant.
ScRNA-seq data preparation and processing
Single-cell transcriptome files of GSE110009 and GSE120065 were downloaded from the GEO database These two single-cell transcriptome profiles were integrated for downstream analysis. For nFeature_RNA < 50, mitochondrial sequencing count > 5% of the cells were excluded. The batch effect in the study removed with regularized negative binomial regression by “Seurat” package. The t-distributed stochastic neighbor embedding (t-SNE) analysis was performed for dimension reduction analysis. We used the “Seurat” package to find cluster biomarkers and clustered the cells. Ultimately, the results indicate that SPOCK1 and POSTN are mainly expressed in CAF for CRC (Additional file
1: Fig.S1A–D).
Gene set enrichment analysis (GSEA)
According to the median expression value of key genes, CRC patients in TCGA were divided into high and low expression groups. Gene Set Enrichment Analyses (GSEA) were performed to identify enriched KEGG pathways in the two groups using the JAVA program (
https://www.broadinstitute.org/gsea). The number of permutations was set at 1000 for each analysis. Nominal
p < 0.05, and a false discovery rate (FDR)
q < 0.25 were considered statistically significant. Multiple GSEA plots were visualized by “plyr”, “ggplot2”, and “grid” packages in R.
Correlation analysis between key genes and immune microenvironment
The CIBERSORT[
15] (
https://cibersort.stanford.edu/) algorithm was performed to evaluate the relative abundances of 22 immune cells in all CRC samples based on RNA-seq from the TCGA database. Samples with
p < 0.05 were considered qualified for further analysis. The 22 immune cells between high and low expression groups of key genes were visualized by using the “vioplot” package. The Tumor Immune Estimation Resource (TIMER) (
https://cistrome.shinyapps.io/timer/) online database is also known as a web server for analyzing tumor-infiltrating immune cells[
16]. We downloaded the immune infiltration levels of TCGA-CRC patients. The relationship between key gene expression and six immune cell subtypes (B cells, CD4+ T cells, CD8+ T cells, dendritic cells, macrophages, and neutrophils) was determined.
10 immunohistochemical staining and tissue immunofluorescence staining
Colon cancer specimens were fixed in 4% paraformaldehyde solution for 3 days and treated with paraffin embedding technology. After dewaxed, rehydrated, and antigenic recovery, the primary antibody added before blocking with BSA was treated with EDTA antigenic repair buffer (pH8.0). The dilution ratio of primary antibody was anti-SPOCK1(1:200 dilution, ab229935, Abcam), anti-POSTN (1:200, bs-4994R, Bioss), anti-cd68(1:100, SC-20,060, Santa), anti-206(1:100,18704-AP, Proteintech). Immunohistochemistry was performed with a DAB staining kit (GeneTech Company, Ltd., Shanghai, China) for the protein expression of SPOCK1, POSTN, CD68, CD206, PD1, and TIM-3 in human colon cancer samples. For tissue immunofluorescence, the flat slices were incubated at 4 C overnight. The next day, the secondary antibody covered with the corresponding species of the primary antibody was added and incubated for 1 h at room temperature. The secondary antibody was counterstained with a fluorescent secondary antibody Cy3 conjugated goat anti-rabbit and FITC conjugated goat anti-mouse IgG (H + L) (1:50 dilution) and 4, 6-diamidino-2-phenylindole (DAPI) and sealed with an anti-fluorescence quenching capsule. All of the images were captured by a Nikon NIS Elements BR light microscope (Nikon, Tokyo, Japan). Analyzing the fluorescence intensity of each sample in different regions with the ImageJ software to determine the average staining intensity.
Statistical analysis
Statistical analyses were performed using R version 3.6.3 and SPSS V25.0. The Kaplan–Meier method was conducted to plot survival curves, and a log-rank was used as the statistical significance test. Differences among variables were compared using t-tests, nonparametric tests. The correlation of gene expression was evaluated by Spearman’s correlation. If not specified above, p < 0.05 was considered statistically significant.
Discussion
In the presented study, we screened out differentially expressed genes (DEGs) based on the immune and stromal scores of CRC samples in TCGA. CAF-related SPOCK1 and POSTN were identified as key prognostic genes for CRC. In the validation phase, SPOCK1 and POSTN have highly expressed in tumor cells and CAF for CRC, and the expressions associated with CAF have a poor prognosis and late clinical stage of CRC. In addition, we found that SPOCK1 and POSTN related to CAF are involved in immune activity, co-expressed with M2-type macrophages, and are associated with immune cell infiltration and immune escape.
In TME, immune and stromal cells play a critical role in tumorigenesis. Growing studies have shown that the composition of the TME can predict patient prognosis and serve as an important target for cancer therapy [
20‐
23]. It has been reported that the immune and stroma classification of CRC is relevant to precision immunotherapy [
24]. In recent years, immunotherapy has made great progress in the treatment of CRC, especially in patients with microsatellite instability [
25,
26]. Because the molecular mechanism of the TME in CRC is still unclear, and there is a lack of effective immune biomarkers, current immunotherapy has only achieved significant clinical effects in a subset of CRC patients [
27]. Therefore, it is necessary to take an in-depth understanding of the composition of TME and to investigate novel prognostic biomarkers for the immunotherapy of CRC.
Here, we use the ESTIMATE method to calculate the immune and stromal scores of CRC samples from TCGA. The CRC samples were divided into two groups based on the median value of the immune scores and stromal scores. DEGs between the two groups were selected. 1314 upregulated DEGs were in common among the high immune/stromal score groups and 4 downregulated DEGs were in common among the low immune/stromal score groups. Functional enrichment analysis showed that the biological processes and main pathways of these DEGs were related to immune-related terms, such as lymphocyte-mediated immunity and cytokine-cytokine receptor interaction. These results indicate that these DEGs are highly correlated with immune response and tumor immune microenvironment. Based on the obtained 1318 DEGs, the WGCNA method was used to screen clinically relevant hub genes; while univariate Cox regression analysis was used to identify survival-related genes. The intersection involved SPOCK1 and POSTN as the key prognostic genes, and scRNA-seq analysis for CRC showed that SPOCK1 and POSTN were mainly expressed in CAF. Studies reported that SPOCK1 and POSTN expressions related to CAF were associated with tumorous prognosis. We validated that high expression of SPOCK1 and POSTN was associated with the poor prognosis of CRC patients by applying OS and DFS analyses based on GEPIA and GSE17536. This is consistent with previous studies on POSTN [
28,
29], but the prognostic value of SPOCK1 in CRC has not been reported before. In addition, our tissue specimens also showed significant overexpression of SPOCK1 and POSTN in tumor cells and CAF for colon cancer. Besides, high expression of SPOCK1 and POSTN was correlated to higher clinical stage, T stage, and N stage, but not with M stage, which is consistent with the WGCNA analysis. Accordingly, the results indicated the potential of SPOCK1 and POSTN as prognostic markers and therapeutic targets for TME in CRC.
SPOCK1 encodes a Ca2
+-binding matricellular glycoprotein, which belongs to the SPARC family. SPOCK1 has been demonstrated to function in cell proliferation, migration, and apoptosis in certain types of cancer, such as pancreatic ductal adenocarcinoma [
30], lung cancer [
31], colorectal cancer[
32], and hepatocellular carcinoma [
33], indicating that SPOCK1 plays an important role in oncogenesis. To elucidate the biological roles of SPOCK1, we conducted GSEA to explore the relevant pathways. It was shown that the SPOCK1 high expression group is mainly enriched in immune-related pathways, such as chemokine signaling pathway, cytokine-cytokine receptor interaction, JAK-STAT signaling pathway. In addition, tumorigenesis-related pathways, such as ECM receptor interaction and MAPK signaling pathway, were also enriched in the SPOCK1 high expression group. Previous studies have shown that SPOCK1 is a key regulator of ECM and can mediate EMT in cancer cells [
34]. Altogether, these results suggested that SPOCK1 promotes tumor progression by affecting these tumor and immune-related pathways, leading to a poor prognosis in CRC patients. POSTN, as a small extracellular matrix protein, plays a vital role in the regulation of cell-matrix interaction, which is considered to associate with TME and tumor progression [
35]. Accumulated evidence suggested that POSTN promotes tumor metastasis by regulating immune responses [
36,
37]. In our study, the GSEA results showed that high expression levels of POSTN were associated with many immune-related pathways, for instance, B cell receptor signaling pathway, chemokine signaling pathway, and cytokine-cytokine receptor interaction. It could be hypothesized that the regulation of immune-related signaling pathways may be involved in the regulatory role of POSTN in the clinical stage and prognosis in CRC.
Tumor-infiltrating immune cells (TIICs) are an important part of the complex TME that regulates tumorigenesis [
38]. Therefore, we adopted the CIBERSORT algorithm in the present study to further explore the relationships of SPOCK1 and POSTN with the 22 TIICs subsets of immune reaction. The results showed that there were seven types of immune cells, plasma cells, CD4 memory resting T cells, monocytes, macrophages M0, macrophages M1, activated dendritic cells, and neutrophils, that had significantly different proportions based on the expression levels of SPOCK1. Similarly, the expression levels of POSTN were significantly correlated with infiltration levels of plasma cells, CD4 memory resting T cells, monocytes, macrophage M0, macrophage M2, and neutrophils. For validation, we used TIMER to analyze the correlation between the expression of SPOCK1 and POSTN with immune cell infiltration. In addition, we verified significant positive expressions of CD68 (macrophage M0) and CD206 (macrophage M2) in 8 cases of colorectal cancer using immunohistochemistry and immunofluorescence techniques. Our results showed that SPOCK1 and POSTN were significantly associated with immune cell infiltration. Altogether, these results indicated that SPOCK1 and POSTN had the strongest and consistent correlation with macrophage and neutrophil cells. It has been shown that increased intratumoral neutrophil in CRC is related to the acquisition of malignant phenotype and is an independent factor for the poor prognosis of CRC patients [
39,
40]. The role of macrophages in CRC is controversial, but many studies have shown that tumor-infiltrating macrophages M2 promote the metastasis of CRC, leading to poor prognosis [
41‐
44]. Zhang Y et al. found that SPOCK1 is related to the recruitment and differentiation of macrophages [
45]. Recent evidence revealed that POSTN could recruit M2 tumor-related macrophages and promote malignant growth [
46]. These are consistent with our conclusion. Taken together, it is possible that SPOCK1 and POSTN potentially regulate immune cell infiltration in the CRC microenvironment.
Immune checkpoint molecules are often associated with tumor cell immune evasion and tumor progression. In return, immune checkpoint inhibitors have shown great success in the treatment of CRC [
25,
47]. Our results showed that SPOCK1 and POSTN expression were positively correlated with the expression of immune checkpoints, including PD-1, PD-L1, PD-L2, CTLA4, TIM-3, and B7-H3. PD-L1/2 suppresses T-cell function through the PD-1 receptor, causing tumor cells to escape from immune surveillance. Previous studies have shown that PD-1, PD-L1, PD-L2, and CTLA4 expressions were associated with a poor prognosis in CRC patients [
48‐
51]. TIM-3 and B7-H3 are considered novel promising targets for immunotherapy. Recent researches have revealed that TIM-3 and B7-H3 are also involved in the evasion of cancer immune surveillance and CRC progression [
52,
53]. These results suggest that SPOCK1 and POSTN may play a role in immune evasion, which partly explains their potential mechanisms for promoting tumor progression.
This study had some limitations. First, the roles of SPOCK1 and POSTN in CRC were analyzed based on TCGA or GEO data, so our results need to be verified with larger sample sizes. Second, the expression of SPOCK1 and POSTN was only verified in several cases of colon cancer due to limited resources. Therefore, subsequent experiments in vivo and in vitro should be required to confirm the concrete relationship between SPOCK1, POSTN, and infiltrating immune cells.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.