Skip to main content
Erschienen in: Journal of Translational Medicine 1/2023

Open Access 01.12.2023 | Research

Development and validation of a clinic machine-learning nomogram for the prediction of risk stratifications of prostate cancer based on functional subsets of peripheral lymphocyte

verfasst von: Chunguang Yang, Zhenghao Liu, Yin Fang, Xinyu Cao, Guoping Xu, Zhihua Wang, Zhiquan Hu, Shaogang Wang, Xinglong Wu

Erschienen in: Journal of Translational Medicine | Ausgabe 1/2023

Abstract

Background

Non-invasive risk stratification contributes to the precise treatment of prostate cancer (PCa). In previous studies, lymphocyte subsets were used to differentiate between low-/intermediate-risk and high-risk PCa, with limited clinical value and poor interpretability. Based on functional subsets of peripheral lymphocyte with the largest sample size to date, this study aims to construct an easy-to-use and robust nomogram to guide the tripartite risk stratifications for PCa.

Methods

We retrospectively collected data from 2039 PCa and benign prostate disease (BPD) patients with 42 clinical characteristics on functional subsets of peripheral lymphocyte. After quality control and feature selection, clinical data with the optimal feature subset were utilized for the 10-fold cross-validation of five Machine Learning (ML) models for the task of predicting low-, intermediate- and high-risk stratification of PCa. Then, a novel clinic-ML nomogram was constructed using probabilistic predictions of the trained ML models via the combination of a multivariable Ordinal Logistic Regression analysis and the proposed feature mapping algorithm.

Results

197 PCa patients, including 56 BPD, were enrolled in the study. An optimal subset with nine clinical features was selected. Compared with the best ML model and the clinic nomogram, the clinic-ML nomogram achieved the superior performance with a sensitivity of 0.713 (95% CI 0.573–0.853), specificity of 0.869 (95% CI 0.764–0.974), F1 of 0.699 (95% CI 0.557–0.841), and AUC of 0.864 (95% CI 0.794–0.935). The calibration curve and Decision Curve Analysis (DCA) indicated the predictive capacity and net benefits of the clinic-ML nomogram were improved.

Conclusion

Combining the interpretability and simplicity of a nomogram with the efficacy and robustness of ML models, the proposed clinic-ML nomogram can serve as an insight tool for preoperative assessment of PCa risk stratifications, and could provide essential information for the individual diagnosis and treatment in PCa patients.
Hinweise

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1186/​s12967-023-04318-w.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abkürzungen
AUC
Area Under the ROC Curve
BPD
Benign prostate disease
CSPCa
Clinically significant PCa
DCA
Decision curve analysis
DT
Decision tree
FMA
Feature mapping algorithm
ML
Machine Learning
mpMRI
Multiparametric MRI
OLR
Ordinal logistic regression
PCa
Prostate cancer
PSA
Prostate-specific antigen
ROC
Receiver operating characteristic curve
RF
Random forest
SVM
Support vector machine

Introduction

Prostate cancer (PCa) is one of the leading cancer types for the estimated new cancer cases and deaths in men worldwide [1]. Proper management of PCa patients required accurately assess the presence of, and a diagnostic evaluation of the characteristic severity of, the disease, thereby avoiding misestimation of patients [2]. Prostate-specific antigen (PSA) is a commonly used clinical biomarker for screening and diagnosis of PCa, while its high false-positive rate for diagnosis as a PCa biomarker has been questioned [3]. In clinical practice, multiparametric MRI (mpMRI) techniques are promising in detection and characterization of PCa [4]. However, mpMRI is still restricted by benign confounding appearances and substantial intra- and inter-reader variability. Systematic prostate biopsy is commonly performed for cancer detection with relatively low sensitivity and specificity, which could lead to delayed diagnosis as well as over-diagnosis with unnecessary discomfort and cost [5, 6]. Urologists are looking for a novel, non-invasive way to improve the accuracy of PCa detection, staging, and risk stratifications.
Minimally blood or urine-based approaches (“liquid biopsies”) are increasingly being used for cancer detection, enabling a precision oncology approach [7]. Information about tumors (e.g., circulating tumor cells, cell-free DNA and RNA) and immune responses (e.g., immune cell subsets, cytokines and exosome expression profiles) are potential diagnostic, prognostic and therapeutic targets of PCa [8, 9]. Inflammation and immune response contribute to tumorigenesis [10]. Many peripheral blood markers of inflammation and immune response are diagnostic and prognostic indicators of PCa [1113]. Lymphocyte subsets, including T cells, B cells, and innate lymphoid cells, can distinguish between benign prostate disease (BPD) and PCa and predict clinical risk (low-/intermediate-risk disease and high-risk disease) in asymptomatic men [9, 13]. Clinically significant PCa (CSPCa) refers to intermediate- and high-risk PCa that still requires treatment in clinical practice according to the EAU guidelines [14]. Therefore, “indolent cancers” (low-risk PCa) and BPD are more appropriately grouped together than intermediate-risk PCa in PCa screening. Furthermore, treatment options for intermediate-risk patients range from focal therapy, radical prostatectomy to various radiotherapy approaches, whereas high-risk PCa is candidate for systemic therapy, indicating that a distinction should be made between intermediate-risk disease and high-risk disease [14, 15]. Unfortunately, few studies have examined the ability of lymphocyte subsets to distinguish among low-, intermediate-, and high-risk PCa [9, 13]. In addition, functional status of lymphocytes if not all, have rarely been studied in terms of diagnostic performance.
Automated methods to detect PCa and distinguish indolent from aggressive disease based on clinical records can assist in early diagnosis and treatment planning. Machine learning (ML), which employs computational algorithms that can accurately extract features without explicit pre-instructions, has been introduced as an advanced technique for aiding in the detection and characterization of PCa [9, 1620]. ML approaches based on peripheral blood lymphocyte subsets can distinguish BPD from PCa, or low-/intermediate-risk from high-risk PCa from a small sample size in a hospital-based study [9, 13]. Thus, despite success of existing studies, these ML approaches don’t match the unmet medical need, with poor interpretation and low generalizability.
To address these challenges, this study included subjects ranging from BPD, low-risk, intermediate-risk, and high-risk PCa with clinical characteristics collected from two campuses of Wuhan Tongji Hospital, forming the largest sample size to date regarding functional subsets of peripheral lymphocyte for the diagnosis of PCa. We aimed to develop an easy-to-use and robust clinic-ML nomogram to aid in the non-invasive diagnosis and tripartite risk stratification of PCa.

Methods

Patient data collection

The study was approved by the Research Ethics Commission of Tongji Hospital and the requirement for informed consent was waived by the Ethics Commission (IRB ID: TJ- IRB20211246). The study screened 2039 patients with PCa and BPD who were admitted to Wuhan Tongji Hospital (China) from August 1st, 2020 to October 20th, 2022. Patients with missing laboratory, radiological or pathological data, or poor-quality MRI images were excluded from the study. Ultimately, 197 PCa patients, including 56 BPD, were enrolled in the study (Fig. 1). To maximize the utilization of the collected data, both nCSPCa and BPD were grouped into low-risk PCa category. All enrolled patients had the records of 42 clinic characteristics in functional subsets of peripheral lymphocyte (Table 1). The subsets of peripheral lymphocyte were detected by flow cytometry. The serum concentrations of interleukins were measured using the electrochemiluminescence immunoassay method (Cobas E602, Roche). The procedure for flow cytometry and interleukins detection by the clinical laboratory of Wuhan Tongji Hospital has been previously described [21].

Procedures

The workflow of this study is depicted in Fig. 2. Figure 3 illustrates the construction pipeline of the clinic nomogram and the proposed clinic-machine learning nomogram.

Data preprocessing and feature selection

The clinical records of the patients were manually inspected for quality control to identify any missing or abnormal values. Each clinic characteristic was visualized through boxplots (Additional file 1: Fig. S1) during this inspection process. To address uncertainty in the input data, a few recorded values were truncated. For example, if the Prostate-Specific Antigen (PSA) values exceeded 1000, they were re-processed and recorded as 1000. Similarly, in the case of ATL, Interleukin-6, Interleukin-1β, and Interleukin-10, certain characteristic values below a specific threshold cannot be accurately recorded due to machine measurement precision. Consequently, all these values for ATL, Interleukin-6, and Interleukin-1β were uniformly truncated to 5, 1.5 and 5, respectively. Additionally, Interleukin-10 was removed from the records due to too many duplicate values. As a result, a total of 41 clinic characteristics in functional subsets were used for the subsequent analysis.
After manual inspection, the clinical records were normalized using a min-max normalization scheme (Fig. 2A). The risk stratification of each patient was then manually assigned in accordance with the EAU guideline [14], resulting in 59 low-risk, 48 intermediate-risk, and 90 high-risk PCa patients.
These preprocessed clinic records, along with the corresponding risk stratification assignment, were fed into a Lasso regression algorithm, which selected the most significant features, generating the dataset used for the subsequent analysis (Fig. 2B). The Lasso-selected clinical records were randomly split into a training set and a test set in a 4:1 ratio. Consequently, a total of 157 records are used to train the machine learning (ML) models and construct the nomograms, and 40 records reserved for performance evaluation.

Machine learning models

Five commonly used ML algorithms were employed in this study for the task of predicting the risk stratification of PCa, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), XGBoost and AdaBoost. These ML models were trained using a 10-fold cross-validation approach on the training set (Fig. 2B). The optimal ML model was then selected based on its performance evaluated in the test set (Additional file 1: Table S1) and served as the performance baseline for comparison with nomograms.

Development and validation of the clinic-machine learning nomogram

First, a clinic nomogram was created using a multivariable Ordinal Logistic Regression (OLR) algorithm on the clinic data from the training set (Fig. 2C). Second, a ML nomogram was built through the application of a multivariable OLR algorithm utilizing the probabilistic predictions of the five trained ML models. Third, to fully leverage the interpretability of the nomogram, a feature mapping algorithm (FMA) was developed to convert the ML monogram into a clinic-ML nomogram, using clinic features as variables (Fig. 3). Finally, the performance of the clinic nomogram and the proposed clinic-ML nomogram was evaluated on the test set using the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) and the calibration curve, and the clinical utility was measured through Decision Curve Analysis (DCA) (Fig. 2D).
The FMA generates for the clinic-ML nomogram the values of clinic features (CF) as
$${CF}_{i}= \sum _{j=1}^{N}{FI}_{i,j}\times {MV}_{j}$$
(1)
where FIi,j is the feature importance of the ith clinic feature in the jth trained ML model, MVj is the value of the jth ML models in the ML nomogram with i∈(1,M) and j∈(1,N) where M is the number of clinic features and N is the number of ML models, respectively. With the help of the FMA, the ML nomogram can be conveniently converted into a new clinic-ML nomogram whose variables are clinic features. The conversion enhances the interpretability while keeping the efficiency and power of the ML models.

Statistical analysis

T-test or Mann-Whitney U-test were used for continuous variables conforming to normal distribution and homogeneity of variance. The Kruskal-Wallis H-test was used for testing other continuous and categorical variables. The implementation of ML algorithms, Lasso regression and ROC analysis was carried out using the Scikit-learn package in Python 3.6. All other statistical analyses were performed using the R statistical software Version 3.4.1. The “rms” package was utilized for the univariate, multivariate, and ordinal logistic regression analyses. The calibration plots and DCA were performed using the “rms” and “dca” package, respectively. The statistically significant difference between the AUCs of two ROCs was analyzed using the Delong test. A two-sided p value of less than 0.05 was considered statistically significant.

Results

Characteristics of patients

There were no significant differences arising in most clinic features between patients in the training and test sets (Table 1). However, significant differences were detected among low-, intermediate- and high-risk PCa patients in twelve clinic features in the training set, including Age, PSA, Neutrophil percentage, Neutrophils, Hemoglobing, Alkaline phosphatase, Lactate dehydrogenase, Th/Ts, Activated Ts cells, Interleukin-1β, Interleukin-2R, and Interleukin-6 (p < 0.05) (Table 2).
Table 1
Clinical characteristics of patients
Features
(mean + SD)
Training set
Test set
p-value
157 (80%)
40 (20%)
Age (years)
66.24 ± 7.85
64.65 ± 7.22
0.2506
PSA
96.88 ± 230.79
100.68 ± 220.24
0.9257
Neutrophil percentage (%)
61.70 ± 10.42
58.53 ± 10.81
0.0914
Neutrophils (×109/L)
3.94 ± 1.81
3.70 ± 1.41
0.3583
Lymphocyte percentage (%)
26.31 ± 8.86
28.82 ± 9.01
0.1130
Lymphocytes (×109/L)
1.55 ± 0.53
1.73 ± 0.65
0.0770
Hemoglobing (/L)
132.43 ± 15.32
136.15 ± 14.17
0.1673
ALT (U/L)
19.77 ± 13.50
21.77 ± 15.55
0.4202
Alkaline phosphatase (U/L)
143.60 ± 382.69
117.65 ± 234.60
0.5937
Lactate dehydrogenase
180.68 ± 93.13
164.40 ± 26.21
0.0587
Serum creatinine (mmol/l)
87.45 ± 45.94
100.85 ± 85.84
0.1834
T cells (CD3+CD19−) (%)
67.70 ± 9.47
69.24 ± 8.08
0.3470
T cells (CD3+CD19−) (/μl)
1040.21 ± 314.91
1173.60 ± 462.04
0.0334*
B cells (CD3−CD19+) (%)
12.14 ± 5.54
13.07 ± 6.36
0.3641
B cells (CD3−CD19+) (/μl)
190.44 ± 120.22
240.05 ± 193.13
0.0451*
Th cells (CD3+CD4+) (%)
43.41 ± 8.56
46.49 ± 7.72
0.0406*
Th cells (CD3+CD4+) (/μl)
666.55 ± 222.74
801.12 ± 364.37
0.0038*
Ts cells (CD3+CD8+) (%)
20.70 ± 6.59
19.48 ± 6.53
0.3002
Ts cells (CD3+CD8+) (/μl)
318.72 ± 134.72
315.95 ± 124.61
0.9068
NK cells (CD3−/CD16+CD56+) (%)
19.37 ± 9.65
16.86 ± 7.90
0.1318
NK cells (CD3−/CD16+CD56+) (/μl)
307.17 ± 204.40
267.38 ± 132.43
0.1409
T cells + B cells + NK cells (%)
99.22 ± 0.64
99.18 ± 0.98
0.7586
T cells + B cells + NK cells (/μl)
1537.82 ± 463.60
1681.03 ± 629.11
0.1104
Th/Ts
2.36 ± 1.02
2.71 ± 1.14
0.0612
Th cells + CD28+(CD3+CD4+CD28+) (/Th)
94.53 ± 7.10
93.30 ± 9.02
0.3579
Ts cells + CD28+(CD3+CD8+CD28+) (/Ts)
58.58 ± 17.65
59.85 ± 16.75
0.6843
Activated T cells (CD3+HLA−DR+) (/μl)
17.90 ± 6.33
17.31 ± 7.30
0.6113
Activated Ts cells (CD3+CD8+HLA−DR+)/Ts (%)
44.46 ± 13.06
41.33 ± 10.24
0.1109
Naïve Th cells (CD3+CD4+CD45RA+)/Th (%)
32.45 ± 13.49
35.65 ± 14.50
0.1910
Memory Th cells (CD3+CD4+CD45RO+)/Th (%)
67.61 ± 13.57
64.35 ± 14.50
0.1843
Regulatory T cells (CD3+CD4+CD25+CD127low+) (/μl)
3.82 ± 1.22
4.00 ± 1.34
0.4085
Naïve regulatory T cells (CD45RA+CD3+CD4+CD25+CD127low+) (/μl)
0.76 ± 0.47
0.79 ± 0.50
0.6688
Induced regulatory T cells (CD45RO+CD3+CD4+CD25+CD127low+) (/μl)
3.06 ± 0.93
3.21 ± 1.10
0.3944
IFN-γ+CD4+T cells /Th (%)
21.47 ± 8.21
19.48 ± 6.75
0.1610
IFN-γ+CD8+T cells /Ts (%)
62.31 ± 15.37
59.91 ± 13.51
0.3690
IFN-γ+NK cells/NK (%)
74.80 ± 14.78
73.58 ± 13.25
0.6383
Interleukin-1β (pg/mL)
7.38 ± 6.18
6.63 ± 4.87
0.4237
Interleukin-2R (U/mL)
498.19 ± 320.20
533.62 ± 509.28
0.5878
Interleukin-6 (pg/mL)
6.87 ± 12.45
6.78 ± 9.36
0.9612
Interleukin-8 (pg/mL)
27.15 ± 36.89
36.77 ± 55.40
0.1927
Tumor necrosis factor-α (pg/mL)
19.58 ± 31.55
23.62 ± 28.55
0.4645
Table 2
Clinical characteristics of the training and test sets of PCa with risk stratifications
Features
(Mean + SD)
Training set
Test set
Low
Intermediate
High
p-value
Low
Intermediate
High
p-value
47 (29.94%)
38 (24.20%)
72 (45.86%)
12 (30.00%)
10 (25.00%)
18 (45.00%)
Age (years)
63.23 ± 8.18
65.53 ± 7.10
68.57 ± 7.42
0.0002*
62.67 ± 7.35
65.80 ± 6.53
65.33 ± 7.81
0.3669
PSA
7.56 ± 7.24
12.67 ± 9.50
199.62 ± 312.92
0.0000*
7.07 ± 4.85
9.60 ± 4.90
213.68 ± 299.19
0.0069*
Neutrophil percentage (%)
65.96 ± 11.34
58.69 ± 10.44
60.51 ± 9.04
0.0103*
60.68 ± 9.56
52.95 ± 11.26
60.20 ± 11.14
0.9428
Neutrophils (× 109/L)
4.60 ± 2.05
3.55 ± 1.80
3.72 ± 1.55
0.0164*
4.03 ± 1.65
2.89 ± 0.90
3.93 ± 1.40
0.9830
Lymphocyte percentage (%)
23.33 ± 9.44
29.32 ± 9.25
26.65 ± 7.77
0.0841
27.87 ± 8.85
34.57 ± 7.91
26.27 ± 8.96
0.4912
Lymphocytes (× 109/L)
1.52 ± 0.65
1.64 ± 0.57
1.53 ± 0.42
0.9605
1.79 ± 0.84
1.88 ± 0.72
1.62 ± 0.50
0.4476
Hemoglobin (/L)
134.13 ± 16.14
137.16 ± 11.84
128.82 ± 15.81
0.0389*
141.75 ± 8.43
136.30 ± 5.79
132.33 ± 19.26
0.0789
ALT (U/L)
21.72 ± 13.18
19.13 ± 12.07
18.83 ± 14.52
0.2742
18.25 ± 9.33
16.10 ± 7.65
27.28 ± 20.53
0.0965
Alkaline phosphatase (U/L)
73.62 ± 29.75
67.87 ± 15.89
229.25 ± 556.22
0.0212*
78.08 ± 23.07
64.40 ± 14.37
173.61 ± 350.66
0.2527
Lactate dehydrogenase
168.89 ± 37.15
157.05 ± 34.14
200.83 ± 129.90
0.0450*
157.58 ± 19.69
158.10 ± 23.03
172.44 ± 30.99
0.1147
Serum creatinine (mmol/l)
91.04 ± 64.35
83.82 ± 12.77
87.03 ± 43.40
0.6881
78.92 ± 10.77
85.10 ± 14.89
124.22 ± 126.82
0.1470
T cells (CD3+CD19−) (%)
67.57 ± 8.16
67.44 ± 9.43
67.92 ± 10.43
0.8305
69.01 ± 9.44
69.69 ± 5.06
69.16 ± 9.06
0.9763
T cells (CD3+CD19−) (/μl)
1059.34 ± 370.24
1100.29 ± 325.67
996.01 ± 266.54
0.2295
1225.58 ± 563.22
1361.70 ± 501.53
1034.44 ± 347.35
0.2196
B cells (CD3−CD19+) (%)
12.74 ± 5.69
12.55 ± 6.07
11.54 ± 5.20
0.2335
12.74 ± 5.44
11.66 ± 7.49
14.07 ± 6.63
0.5322
B cells (CD3−CD19+) (/μl)
211.09 ± 152.39
208.21 ± 136.04
167.58 ± 79.30
0.0428
252.58 ± 208.95
265.40 ± 285.55
217.61 ± 123.77
0.6058
Th cells (CD3+CD4+) (%)
42.65 ± 7.65
42.38 ± 8.95
44.44 ± 8.96
0.2327
45.93 ± 8.34
48.17 ± 8.43
45.93 ± 7.43
0.9376
Th cells (CD3+CD4+) (/μl)
677.06 ± 276.78
686.00 ± 213.15
649.42 ± 189.27
0.4727
840.33 ± 470.67
948.60 ± 394.25
693.06 ± 245.99
0.2295
Ts cells (CD3+CD8+) (%)
22.14 ± 7.31
20.36 ± 5.49
19.94 ± 6.61
0.0845
20.41 ± 8.38
17.79 ± 6.34
19.81 ± 5.57
0.8858
Ts cells (CD3+CD8+) (/μl)
339.81 ± 132.66
336.53 ± 151.08
295.56 ± 125.80
0.0664
342.50 ± 146.68
330.10 ± 111.75
290.39 ± 121.23
0.2579
NK cells (CD3−/CD16+CD56+) (%)
18.86 ± 8.17
19.27 ± 9.26
19.76 ± 10.84
0.6167
17.12 ± 9.10
17.93 ± 8.35
16.10 ± 7.41
0.7029
NK cells (CD3−/CD16+CD56+) (/μl)
297.21 ± 165.47
317.00 ± 186.65
308.47 ± 237.42
0.7993
281.17 ± 142.86
313.80 ± 138.71
232.39 ± 123.17
0.2773
T cells + B cells + NK cells (%)
99.17 ± 0.74
99.26 ± 0.58
99.23 ± 0.61
0.6742
98.86 ± 1.71
99.28 ± 0.47
99.33 ± 0.36
0.2191
T cells + B cells + NK cells (/μl)
1567.64 ± 557.32
1625.50 ± 441.12
1472.07 ± 404.37
0.2198
1759.33 ± 757.08
1940.90 ± 713.72
1484.44 ± 453.55
0.1964
Th/Ts
2.17 ± 0.88
2.24 ± 0.79
2.55 ± 1.19
0.0362*
2.65 ± 1.15
3.09 ± 1.44
2.55 ± 0.99
0.7307
Th cells + CD28+(CD3+CD4+CD28+) (/Th)
94.44 ± 7.14
93.92 ± 7.51
94.91 ± 6.98
0.6804
92.40 ± 11.83
91.95 ± 11.49
94.65 ± 5.22
0.4833
Ts cells + CD28+(CD3+CD8+CD28+) (/Ts)
59.28 ± 20.46
56.62 ± 16.45
59.17 ± 16.55
0.9609
63.03 ± 23.60
54.84 ± 17.78
60.51 ± 10.48
0.7789
Activated T cells (CD3+HLA−DR+) (/μl)
17.16 ± 6.34
18.68 ± 6.24
17.98 ± 6.45
0.5603
17.71 ± 10.66
17.14 ± 6.55
17.14 ± 5.38
0.8456
Activated Ts cells (CD3+CD8+HLA−DR+)/Ts (%)
40.53 ± 13.36
46.01 ± 11.88
46.21 ± 13.16
0.0265*
42.12 ± 12.23
42.45 ± 12.10
40.17 ± 8.35
0.5956
Naïve Th cells (CD3+CD4+CD45RA+)/Th (%)
32.64 ± 13.23
32.58 ± 10.86
32.27 ± 15.08
0.8808
32.98 ± 16.39
43.66 ± 17.85
32.99 ± 10.12
0.8436
Memory Th cells (CD3+CD4+ CD45RO+)/Th (%)
67.36 ± 13.23
67.42 ± 10.86
67.87 ± 15.23
0.8358
67.02 ± 16.39
56.35 ± 17.85
67.01 ± 10.12
0.8435
Regulatory T cells (CD3+CD4+CD25+CD127low+) (/μl)
3.52 ± 1.15
4.17 ± 1.47
3.83 ± 1.08
0.2530
3.26 ± 1.30
4.10 ± 1.00
4.44 ± 1.41
0.0193*
Naïve regulatory T cells (CD45RA+CD3+CD4+CD25+CD127low+) (/μl)
0.69 ± 0.39
0.87 ± 0.66
0.74 ± 0.38
0.6436
0.57 ± 0.32
1.01 ± 0.62
0.82 ± 0.49
0.2496
Induced regulatory T cells (CD45RO+CD3+CD4+CD25+CD127low+) (/μl)
2.83 ± 0.90
3.30 ± 1.03
3.09 ± 0.88
0.2073
2.69 ± 1.03
3.09 ± 0.75
3.62 ± 1.22
0.0208*
IFN-γ+CD4+T cells/Th (%)
21.89 ± 7.82
22.33 ± 8.92
20.73 ± 8.17
0.4090
21.62 ± 8.81
17.81 ± 6.96
18.98 ± 5.12
0.3543
IFN-γ+CD8+T cells/Ts (%)
60.66 ± 17.63
61.97 ± 13.45
63.57 ± 14.93
0.3101
61.64 ± 14.76
61.32 ± 13.10
57.97 ± 13.79
0.4574
IFN-γ+NK cells/NK (%)
76.37 ± 14.31
73.49 ± 14.85
74.46 ± 15.25
0.5405
71.08 ± 12.17
77.28 ± 8.53
73.20 ± 16.31
0.7531
Interleukin-1β (pg/mL)
6.55 ± 4.06
5.92 ± 3.05
8.69 ± 8.10
0.0446*
6.48 ± 3.46
8.29 ± 9.12
5.82 ± 1.33
0.6391
Interleukin-2R (U/mL)
427.45 ± 189.32
444.39 ± 149.37
572.76 ± 425.87
0.0112*
381.67 ± 94.99
403.60 ± 88.02
707.17 ± 736.35
0.0750
Interleukin-6 (pg/mL)
3.79 ± 4.86
4.99 ± 13.06
9.87 ± 14.87
0.0067*
4.33 ± 5.37
2.37 ± 1.38
10.87 ± 12.37
0.0423*
Interleukin-8 (pg/mL)
18.28 ± 21.49
31.45 ± 41.75
30.67 ± 41.55
0.0912
37.36 ± 63.98
15.81 ± 12.54
48.02 ± 64.01
0.5254
Tumor necrosis factor-α (pg/mL)
16.88 ± 20.70
19.12 ± 28.35
21.59 ± 38.68
0.4259
22.05 ± 26.87
18.59 ± 26.37
27.47 ± 32.40
0.5790
*p < 0.05, with significant differences for clinical characteristics of low-, intermediate- and high-risk groups

Selection of clinic features for ML models and the clinic nomogram

The Lasso regression was applied to determine the optimal subset of the clinic features (Fig. 4), yielding a total of nine features, i.e., Age, Alkaline phosphatase, B cells (CD3−CD19+), Interleukin-1β, Interleukin-2R, Lactate dehydrogenase, Neutrophil percentage, PSA and Th/Ts. These nine features were then utilized for the construction of both the ML models and the clinic monogram.

Performance assessment of ML algorithms

The data with Lasso-selected nine features were fed into five ML algorithms with a 10-fold cross validation. All ML algorithms show competitive performance in discriminating various risk stratifications (Fig. 5). The best performance was achieved by XGBoost which indicated favorable predictive efficacy in both training and test sets with AUC values of 0.989 and 0.842, sensitivity of 0.930 and 0.700, and specificity of 0.965 and 0.850, respectively (Table 5).

Development and performance assessment of the clinic-ML nomogram

Results of the univariate and multivariate logistic regression analysis (Table 3) suggested that predictions of four ML models, i.e., AdaBoost, Decision Tree, Random Forest, and XGBoost, were independent predictors of risk stratifications of PCa. Therefore, a multivariate OLR using probabilistic predictions of the four ML models was employed to construct the ML nomogram, which is then converted to a clinic-ML nomogram through the proposed FMA (Fig. 6B). VIFs of the variables in the ML nomogram were found to be within acceptable limits, as 5.13, 1.92, 5.08, and 2.39, respectively.
The predictive scores of the clinic-ML nomogram were strongly correlated with the risk stratifications of PCa in both the training and test set (Fig. 7A). Using cutoff values of 2.24 and 6.00 for the clinic-ML nomogram predictive scores, the patients were classified into three risk stratification groups, and the results indicated the pattern of PCa patients was substantially different among the low-, intermediate- and high-risk stratification groups (Fig. 7B). For instance, in the test set, the probability of PCa patients was found to be significantly higher in the low-risk group compared to those in the intermediate- and high-risk groups (p < 0.05).
Meanwhile, for the purpose of performance comparison, the Lasso-selected clinic features were utilized to construct the clinic monogram (Fig. 3). Analysis of univariate and multivariate logistic regressions revealed that five clinic variables, i.e., Age, B cells (CD3−CD19+), Neutrophil percentage, PSA and Th/Ts, were independent predictors of risk stratifications (Table 4). Subsequently, the corresponding clinic nomogram was constructed (Fig. 6A).
Performance of the clinic-ML nomogram and clinic nomogram was assessed using ROC analysis, showing the clinic-ML nomogram outperformed the clinic nomogram, with an AUC value of 0.998 vs. 0.897 in the training set, and 0.864 vs. 0.837 in the test set, respectively (Fig. 8; Table 5). The Delong test indicated that there was a significant difference in the AUC values of two nomograms in the training and test sets (p < 0.05). In addition, the performance of the clinic-ML nomogram was also superior to that of the optimal ML model, i.e., XGBoost (Table 5). The calibration curve demonstrated improved prediction performance of the clinic-ML nomogram compared to the other models (Fig. 9), which was further validated by the DCA, showing improved net benefits of the clinic-ML nomogram over both XGBoost and the clinic nomogram in both the training and test set (Fig. 10).
Table 3
Logistic regression for predicting risk stratifications of PCa based on predictions of five ML algorithms
ML Models
Univariate logistic regression
Multivariate logistic regression
OR (95% CL)
p-value
OR (95% CL)
p-value
AdaBoost
2.535 (2.358–2.726)
0.000*
1.154 (1.090–1.222)
0.000 *
Decision Tree
2.667 (2.563–2.774)
0.000*
1.554 (1.438–1.680)
0.000 *
Random Forest
2.449 (2.286–2.622)
0.000*
1.150 (1.088–1.214)
0.000 *
SVM
1.906 (1.681–2.162)
0.000*
1.014 (0.980–1.050)
0.419
XGBoost
2.577 (2.462–2.696)
0.000*
1.354 (1.260–1.455)
0.000 *
*p < 0.05. Values in bold indicate independent predictors in the multivariate logistic regression
Table 4
Logistic regression for predicting risk stratifications of PCa based on clinic features
Clinic Features
Univariate logistic regression
Multivariate logistic regression
OR (95%CL)
p-value
OR (95%CL)
p-value
Age
1.286 (1.129–1.465)
0.000*
1.137 (1.008–1.284)
0.037 *
Alkaline phosphatase
1.171 (1.024–1.339)
0.021*
1.121 (0.998–1.260)
0.054
B cells (CD3−CD19+)
0.870 (0.761–0.995)
0.043*
0.844 (0.746–0.956)
0.008 *
Interleukin-1β
1.148 (1.003–1.313)
0.045*
1.095 (0.972–1.234)
0.133
Interleukin-2R
1.189 (1.041 1.359)
0.011*
1.120 (0.996–1.260)
0.059
Lactate dehydrogenase
1.147 (1.003–1.313)
0.045*
1.110 (0.987–1.248)
0.082
Neutrophil percentage
0.839 (0.734–0.959)
0.010*
0.799 (0.708–0.902)
0.000 *
PSA
1.379 (1.215–1.564)
0.000*
1.228 (1.084–1.391)
0.001 *
Th/Ts
1.155 (1.009–1.320)
0.036*
1.200 (1.069–1.346)
0.002 *
*p < 0.05. Values in bold indicate independent predictors in the multivariate logistic regression
Table 5
Performance evaluation of XGBoost, the clinic nomogram and the clinic-ML nomogram in the training (first line in each cell) and test set (second line in each cell)
Models
Sensitivity
(95% CL)
Specificity
(95% CL)
F1
(95% CL)
AUC
(95% CL)
XGBoost
0.924 (0.883–0.965)
0.680 (0.535–0.825)
0.963 (0.933–0.993)
0.853 (0.743–0.963)
0.927 (0.886–0.968)
0.664 (0.518–0.810)
0.989 (0.980–0.998)
0.842 (0.764–0.919)
Clinic nomogram
0.704 (0.633–0.775)
0.609 (0.458–0.760)
0.870(0.817–0.923)
0.822 (0.703–0.941)
0.700(0.628–0.772)
0.585 (0.432–0.738)
0.897 (0.867–0.926)
0.837 (0.764–0.910)
Clinic-ML nomogram
0.983 (0.963–1.000)
0.713 (0.573–0.853)
0.994 (0.982–1.000)
0.869 (0.764–0.974)
0.985 (0.966–1.000)
0.699 (0.557–0.841)
0.998 (0.996–1.000)
0.864 (0.794–0.935)
Better results are shown in bold

Discussion

The retrospective study aims to develop a clinic-ML nomogram for predicting risk stratifications of PCa patients based on functional subsets of peripheral lymphocyte. A total of 197 PCa patients were included and 41 clinic characteristics were collected, forming the largest number of samples used in a study of its kind. After Lasso regression, an optimal subset of nine clinic features, i.e., Age, Alkaline phosphatase, B cells (CD3−CD19+), Interleukin-1β, Interleukin-2R, Lactate dehydrogenase, Neutrophil percentage, PSA and Th/Ts, was selected and explored for the prognostic validity of the proposed clinic-ML nomogram by comparing it with a conventional clinic nomogram and various ML models both of which were constructed directly based on clinic characteristics. The results demonstrated that the clinic-ML nomogram fully leveraged the predictive capability of ML algorithms and outperformed the conventional nomogram and the best ML model in terms of accuracy and clinical utility. Meanwhile, the clinic-ML nomogram was more distinguishable and easier to manipulate than the clinic nomogram among three risk stratifications (Fig. 6), and had a strong guiding effect on active surveillance treatment for low-risk PCa patients (Fig. 7). Thus, the clinic-ML nomogram can serve as an insight tool for preoperative assessment of risk stratifications of PCa, combining the interpretability and simplicity of a nomogram with the efficacy and robustness of ML algorithms.
This study divided PCa patients into three risk groups, which is more closely related to the clinical treatment. However, few studies have been conducted to predict three-levels of risk stratifications of PCa using lymphocyte subsets with a nomogram. Our study combined the nomogram and the ML models to further improve the diagnostic efficiency. Meanwhile, some other studies utilized imaging data (such as PSMA PET/CT, MRI, TRUS) with other clinic indicators to establish the nomogram for the prediction of PCa risk stratifications [2225]. Despite of the improved performance with the imaging data modality, those studies achieved comparable, if not slightly inferior, results compared to the present study (Additional file 1: Table S2). In addition, the use of “scores” calculated by sophisticated algorithms as variables in the nomogram may be helpful in improving prediction accuracy, but may also increase the complexity of the nomogram and make it more difficult to interpret [17, 26]. The approach taken in this study, which used the most significant examination feature as variables in the clinical ML nomogram, may provide a more direct and simple method for assessing patient risk stratifications.
The study presented several limitations that should be acknowledged. Firstly, all the data were collected exclusively from one medical center with two campuses located in the same city. Therefore, the generalizability of the proposed clinic-ML nomogram to other populations and settings remains unknown and requires further evaluation in other cohorts. To address this issue, a multi-center study is planned to assess the external validity and robustness of the clinic-ML nomogram. Secondly, the number of ML algorithms used in the development of the clinic-ML nomogram was limited, and future studies may benefit from the inclusion of additional ML algorithms to enhance the performance of the nomogram. Thirdly, the imaging data plays a crucial role in the diagnosis and staging of PCa, and its integration into the clinic-ML nomogram could further improve its diagnostic efficiency and predictive power.
The application of nomograms in clinic diagnosis has gained popularity in recent years due to their simplicity, intuition, and interpretability [27]. The integration of nomograms with powerful ML algorithms to improve the performance while maintaining interpretability of the nomogram is a research hot-spot [2830]. The proposed clinic-ML nomogram is an easy-to-use and powerful tool for accurately predicting the risk stratification of PCa patients, which could provide essential information for individual diagnosis and treatment in PCa.

Acknowledgements

The authors are grateful to the doctors in Department of Urology, Wuhan Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology for the support in collecting the clinical data.

Declarations

The study was approved by the Research Ethics Commission of Wuhan Tongji Hospital and the requirement for informed consent was waived by the Ethics Commission (IRB ID: TJ- IRB20211246).
Not applicable.

Competing interests

The authors have declared that no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​. The Creative Commons Public Domain Dedication waiver (http://​creativecommons.​org/​publicdomain/​zero/​1.​0/​) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
4.
Zurück zum Zitat Grey ADR, Scott R, Shah B, Acher P, Liyanage S, Pavlou M, Omar R, Chinegwundoh F, Patki P, Shah TT, Hamid S, Ghei M, Gilbert K, Campbell D, Brew-Graves C, Arumainayagam N, Chapman A, McLeavy L, Karatziou A, Alsaadi Z, Ahmed HU. Multiparametric ultrasound versus multiparametric MRI to diagnose prostate cancer (CADMUS): a prospective, multicentre, paired-cohort, confirmatory study. Lancet Oncol. 2022;23(3):428–38. https://doi.org/10.1016/S1470-2045(22)00016-X.CrossRefPubMed Grey ADR, Scott R, Shah B, Acher P, Liyanage S, Pavlou M, Omar R, Chinegwundoh F, Patki P, Shah TT, Hamid S, Ghei M, Gilbert K, Campbell D, Brew-Graves C, Arumainayagam N, Chapman A, McLeavy L, Karatziou A, Alsaadi Z, Ahmed HU. Multiparametric ultrasound versus multiparametric MRI to diagnose prostate cancer (CADMUS): a prospective, multicentre, paired-cohort, confirmatory study. Lancet Oncol. 2022;23(3):428–38. https://​doi.​org/​10.​1016/​S1470-2045(22)00016-X.CrossRefPubMed
11.
Zurück zum Zitat Bauckneht M, Rebuzzi SE, Signori A, Frantellizzi V, Murianni V, Lodi Rizzini E, Mascia M, Lavelli V, Donegani MI, Ponzano M, Gaudiano A, Stazza ML, Licari M, Cavallini L, Laghi V, Cindolo L, Maggi M, Sciarra A, Mammucci P, Sambuceti G, Fornarini G. The prognostic power of inflammatory indices and clinical factors in metastatic castration-resistant prostate cancer patients treated with radium-223 (BIO-Ra study). Eur J Nucl Med Mol Imaging. 2022;49(3):1063–74. https://doi.org/10.1007/s00259-021-05550-6.CrossRefPubMed Bauckneht M, Rebuzzi SE, Signori A, Frantellizzi V, Murianni V, Lodi Rizzini E, Mascia M, Lavelli V, Donegani MI, Ponzano M, Gaudiano A, Stazza ML, Licari M, Cavallini L, Laghi V, Cindolo L, Maggi M, Sciarra A, Mammucci P, Sambuceti G, Fornarini G. The prognostic power of inflammatory indices and clinical factors in metastatic castration-resistant prostate cancer patients treated with radium-223 (BIO-Ra study). Eur J Nucl Med Mol Imaging. 2022;49(3):1063–74. https://​doi.​org/​10.​1007/​s00259-021-05550-6.CrossRefPubMed
14.
Zurück zum Zitat Mottet N, van den Bergh RCN, Briers E, Van den Broeck T, Cumberbatch MG, De Santis M, Fanti S, Fossati N, Gandaglia G, Gillessen S, Grivas N, Grummet J, Henry AM, van der Kwast TH, Lam TB, Lardas M, Liew M, Mason MD, Moris L, Oprea-Lager DE, Cornford P. EAU-EANM-ESTRO-ESUR-SIOG guidelines on prostate Cancer-2020 update. Part 1: screening, diagnosis, and local treatment with curative intent. Eur Urol. 2021;79(2):243–62. https://doi.org/10.1016/j.eururo.2020.09.042.CrossRefPubMed Mottet N, van den Bergh RCN, Briers E, Van den Broeck T, Cumberbatch MG, De Santis M, Fanti S, Fossati N, Gandaglia G, Gillessen S, Grivas N, Grummet J, Henry AM, van der Kwast TH, Lam TB, Lardas M, Liew M, Mason MD, Moris L, Oprea-Lager DE, Cornford P. EAU-EANM-ESTRO-ESUR-SIOG guidelines on prostate Cancer-2020 update. Part 1: screening, diagnosis, and local treatment with curative intent. Eur Urol. 2021;79(2):243–62. https://​doi.​org/​10.​1016/​j.​eururo.​2020.​09.​042.CrossRefPubMed
17.
Zurück zum Zitat Hiremath A, Shiradkar R, Fu P, Mahran A, Rastinehad AR, Tewari A, Tirumani SH, Purysko A, Ponsky L, Madabhushi A. An integrated nomogram combining deep learning, Prostate Imaging-Reporting and Data System (PI-RADS) scoring, and clinical variables for identification of clinically significant prostate cancer on biparametric MRI: a retrospective multicentre study. Lancet Digit Health. 2021;3(7):e445–54. https://doi.org/10.1016/S2589-7500(21)00082-0.CrossRefPubMedPubMedCentral Hiremath A, Shiradkar R, Fu P, Mahran A, Rastinehad AR, Tewari A, Tirumani SH, Purysko A, Ponsky L, Madabhushi A. An integrated nomogram combining deep learning, Prostate Imaging-Reporting and Data System (PI-RADS) scoring, and clinical variables for identification of clinically significant prostate cancer on biparametric MRI: a retrospective multicentre study. Lancet Digit Health. 2021;3(7):e445–54. https://​doi.​org/​10.​1016/​S2589-7500(21)00082-0.CrossRefPubMedPubMedCentral
23.
Zurück zum Zitat Xiang M, Ma TM, Savjani R, Pollom EL, Karnes RJ, Grogan T, Wong JK, Motterle G, Tosoian JJ, Trock BJ, Klein EA, Stish BJ, Dess RT, Spratt DE, Pilar A, Reddy C, Levin-Epstein R, Wedde TB, Lilleby WA, Fiano R, et al. Performance of a prostate-specific membrane antigen positron emission tomography/computed tomography-derived risk-stratification tool for high-risk and very high-risk prostate cancer. JAMA Netw Open. 2021;4(12): e2138550. https://doi.org/10.1001/jamanetworkopen.2021.38550.CrossRefPubMedPubMedCentral Xiang M, Ma TM, Savjani R, Pollom EL, Karnes RJ, Grogan T, Wong JK, Motterle G, Tosoian JJ, Trock BJ, Klein EA, Stish BJ, Dess RT, Spratt DE, Pilar A, Reddy C, Levin-Epstein R, Wedde TB, Lilleby WA, Fiano R, et al. Performance of a prostate-specific membrane antigen positron emission tomography/computed tomography-derived risk-stratification tool for high-risk and very high-risk prostate cancer. JAMA Netw Open. 2021;4(12): e2138550. https://​doi.​org/​10.​1001/​jamanetworkopen.​2021.​38550.CrossRefPubMedPubMedCentral
25.
Zurück zum Zitat Montoya Perez I, Jambor I, Pahikkala T, Airola A, Merisaari H, Saunavaara J, Alinezhad S, Väänänen RM, Tallgrén T, Verho J, Kiviniemi A, Ettala O, Knaapila J, Syvänen KT, Kallajoki M, Vainio P, Aronen HJ, Pettersson K, Boström PJ, Taimen P. Prostate cancer risk stratification in men with a clinical suspicion of prostate cancer using a unique biparametric MRI and expression of 11 genes in apparently benign tissue: evaluation using machine-learning techniques. J Magn Reson Imaging. 2020;51(5):1540–53. https://doi.org/10.1002/jmri.26945.CrossRefPubMed Montoya Perez I, Jambor I, Pahikkala T, Airola A, Merisaari H, Saunavaara J, Alinezhad S, Väänänen RM, Tallgrén T, Verho J, Kiviniemi A, Ettala O, Knaapila J, Syvänen KT, Kallajoki M, Vainio P, Aronen HJ, Pettersson K, Boström PJ, Taimen P. Prostate cancer risk stratification in men with a clinical suspicion of prostate cancer using a unique biparametric MRI and expression of 11 genes in apparently benign tissue: evaluation using machine-learning techniques. J Magn Reson Imaging. 2020;51(5):1540–53. https://​doi.​org/​10.​1002/​jmri.​26945.CrossRefPubMed
27.
Zurück zum Zitat Dong D, Fang MJ, Tang L, Shan XH, Gao JB, Giganti F, Wang RP, Chen X, Wang XX, Palumbo D, Fu J, Li WC, Li J, Zhong LZ, De Cobelli F, Ji JF, Liu ZY, Tian J. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Ann Oncol. 2020;31(7):912–20. https://doi.org/10.1016/j.annonc.2020.04.003.CrossRefPubMed Dong D, Fang MJ, Tang L, Shan XH, Gao JB, Giganti F, Wang RP, Chen X, Wang XX, Palumbo D, Fu J, Li WC, Li J, Zhong LZ, De Cobelli F, Ji JF, Liu ZY, Tian J. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Ann Oncol. 2020;31(7):912–20. https://​doi.​org/​10.​1016/​j.​annonc.​2020.​04.​003.CrossRefPubMed
Metadaten
Titel
Development and validation of a clinic machine-learning nomogram for the prediction of risk stratifications of prostate cancer based on functional subsets of peripheral lymphocyte
verfasst von
Chunguang Yang
Zhenghao Liu
Yin Fang
Xinyu Cao
Guoping Xu
Zhihua Wang
Zhiquan Hu
Shaogang Wang
Xinglong Wu
Publikationsdatum
01.12.2023
Verlag
BioMed Central
Erschienen in
Journal of Translational Medicine / Ausgabe 1/2023
Elektronische ISSN: 1479-5876
DOI
https://doi.org/10.1186/s12967-023-04318-w

Weitere Artikel der Ausgabe 1/2023

Journal of Translational Medicine 1/2023 Zur Ausgabe

Leitlinien kompakt für die Innere Medizin

Mit medbee Pocketcards sicher entscheiden.

Seit 2022 gehört die medbee GmbH zum Springer Medizin Verlag

Erhebliches Risiko für Kehlkopfkrebs bei mäßiger Dysplasie

29.05.2024 Larynxkarzinom Nachrichten

Fast ein Viertel der Personen mit mäßig dysplastischen Stimmlippenläsionen entwickelt einen Kehlkopftumor. Solche Personen benötigen daher eine besonders enge ärztliche Überwachung.

Nach Herzinfarkt mit Typ-1-Diabetes schlechtere Karten als mit Typ 2?

29.05.2024 Herzinfarkt Nachrichten

Bei Menschen mit Typ-2-Diabetes sind die Chancen, einen Myokardinfarkt zu überleben, in den letzten 15 Jahren deutlich gestiegen – nicht jedoch bei Betroffenen mit Typ 1.

15% bedauern gewählte Blasenkrebs-Therapie

29.05.2024 Urothelkarzinom Nachrichten

Ob Patienten und Patientinnen mit neu diagnostiziertem Blasenkrebs ein Jahr später Bedauern über die Therapieentscheidung empfinden, wird einer Studie aus England zufolge von der Radikalität und dem Erfolg des Eingriffs beeinflusst.

Costims – das nächste heiße Ding in der Krebstherapie?

28.05.2024 Onkologische Immuntherapie Nachrichten

„Kalte“ Tumoren werden heiß – CD28-kostimulatorische Antikörper sollen dies ermöglichen. Am besten könnten diese in Kombination mit BiTEs und Checkpointhemmern wirken. Erste klinische Studien laufen bereits.

Update Innere Medizin

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.