Introduction
Non-lactating mastitis [
1] is a benign breast disease, which has clinical characteristics of unknown etiology, easy recurrence and difficult curing. The incidence of NLM is 0.3-1.9% of all breast diseases worldwide. In China, the incidence of NLM is significantly higher than in other global regions, accounting for 2-5% of all breast diseases [
2]. It occurs at any age, and the clinical incidence of NLM has been on the rise in recent years [
3].
The lesions area can include either all quadrants of unilateral breast or bilateral breast. The damage to breast appearance is frequently heavy, and some degree of incapacitation may occur in severe cases. It greatly affects the living quality of patients, aggravates their financial burden and has a great impact on their psychology. Therefore, NLM has become a clinical disease that needs to be solved urgently.
Because of the intricate etiology of NLM and imbalance of data categories, there will be a large bias if traditional statistical models (e.g. Logistic regression, Cox regression model, etc.) are used to study the risk factors of this disease and predict the recurrence probability. For this reason, we introduce ML to assess a larger risk range, which can provide important reference information for medical decision-makers, to reveal important clinical significance and application value. Compared with traditional statistical methods, it can cover more features and assess a larger risk range [
4,
5]. In 1959, Arthur Samuel, a renowned computer scientist, created ML, which can handle large amounts of complex data [
6,
7]. It was first used in 1972 in a medical project at Stanford University. Decision trees, RF, and XGBoost are commonly used machine learning algorithms.
Studies on NLM recurrence prediction models with long-term follow-up have rarely been reported. In this study, we aim to build an ML model to predict the recurrence probability for postoperative NLM by RF and XGBoost algorithms. The SHAP method was used to interpret the model, which provides a reference for clinicians to make accurate diagnostic and treatment decisions for patients. It provides a certain reference for the development of clinical treatment plans, prevention of disease recurrence, and prevention of disease before it occurs.
Materials and methods
Patient selection
This study was conducted on inpatients who were admitted to the Mammary Department of Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine between July 2019 to December 2021. Inpatient data follow-up has been completed until December 2022. The median follow-up duration is 21.20 months. These patients with NLM in this study received a comprehensive treatment that includes surgical, herbal and other treatments.
The ethics committee of Shuguang Hospital affiliated to Shanghai University of TCM approved this study (2019-746-101). This was a retrospective study and all patients signed an informed consent form agreeing to the use of case data for scientific research. No biological specimens were used in this study.
The inclusion criteria are as follows: (1) Fine-needle aspiration or core-needle biopsy confirmed diagnosis of NLM that pathology of the breast mass supports a non-specific inflammatory lesion, which may be seen as acute or chronic inflammatory cells or plasma cells, ductal dilatation, and granuloma formation; (2) Patients with complete clinical data. The exclusion criteria are as follows: (1) Patients with severe cardiovascular, cerebrovascular, hepatic, renal and other systems of primary diseases; (2) Patients with schizophrenia, depression and other psychiatric disorders and long-term oral drug therapy; (3) Patients taking immunosuppressive drugs; (4) Patients with incomplete clinical information and loss of visits to affect the statistical analysis of the data.
The introduction of ML models
In this study, we used two machine learning approaches (RF and XGBoost) to build a model and predict the NLM recurrence risk of female patients.
RF is an integrated algorithm belonging to the Bagging type, by combining multiple weak learners (decision trees), voting or taking the average of the results of the weak learners to get the final results of the model. The results of the model have high accuracy and generalization performance. RF can balance the error caused by the imbalance of dataset, and maintain model accuracy to a certain extent when the dataset is missing too much; At the same time, the algorithm can output good results in most cases even without the hyperparameter tuning process. Therefore, RF has certain advantages in classification and prediction in various fields, and is suitable for classification problems in the medical field dealing with disease recurrence and so on.
XGBoost is an open-source algorithm library that provides a gradient-boosting framework for many programming languages, which applies to a wide range of operating systems. The purpose of the algorithm library is to provide a scalable and portable distributed gradient boosting library. In recent years, the algorithm has gained popularity due to its excellent performance in many machine-learning competitions. It is now widely used in ML and data mining [
8].
Based on ensuring the predictive performance of the model, SHAP evaluation is introduced to enhance the interpretability of the model. The concept of SHAP value in game theory is introduced into the interpretation process of the ML model, which can not only reflect the influence of each sample feature, but also show the positivity and negativity of the influence of each feature on the prediction results. Its interpretability is verified in many models [
9,
10]. The trained model is subjected to tenfold cross-validation to test the performance of the model to reduce problems such as overfitting, and selection bias, and to give the generalization ability of the model on an independent dataset. Compared to the existing recurrence prediction nomogram or regression equation, the SHAP value gives us a chance to combine many high-quality local explanations allowing us to represent global structure while retaining local faithfulness to the original model [
11,
12]. Although the Nomogram can intuitively show the effects of independent variables on the prediction results, it does not provide numerical interpretation. Therefore, in some cases, it may be necessary to combine other statistical methods to further interpret the predictions.
Description of ML model training
The 258 patients were randomly divided into a training set and a test set according to 75%-25% proportion. The proportion of classes (0: No Recurrence, 1: Recurrence) in the training set and the test set is consistent with the original data that is useful to fit machine learning models. For this reason, we use the build-in function train_test_split() of the scikit-learn library and set the parameter “stratify = y”. Here y represents the classification in the original data.
In our research, the following procedure has been carried out for random forest and XGBoost: (a) by using the grid search function
GridSearchCV() of the scikit-learn library, optimal parameters for each method are estimated with 5-fold cross-validation to get the best AUC score. A wide range of parameter values have been explored. For each of the two methods, these values are shown in Table
1. (b) The best set of parameters extracted from the grid search has been used to train the corresponding ensemble using the whole training partition; (c) Lastly, tenfold cross-validation evaluates the effectiveness of model evaluation.
Table 1
Parameter values of different machine learning models
Parameter | Grid search values | Optimal parameter value in our model |
n_estimators | 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 | 80 |
min_samples_leaf | 1, 3, 5, 7, 9, 11, 13, 15 | 11 |
max_features | 0.1, log2, 0.25, sqrt, 1.0 | sqrt |
The remaining parameters are default values in the scikit-learn library |
XGBoost |
Parameter | Grid search values | optimal parameter value in our model |
n_estimators | 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 | 100 |
learning_rate | 0.01, 0.02, 0.05, 0.1 | 0.1 |
gamma | 0, 0.1, 0.2, 0.5, 1.0 | 1.0 |
reg_lambda | 0, 1.0, 10.0 | 10 |
max_depth | 3, 4, 5, 6 | 4 |
colsample_bytree | 0.45, 0.5, 0.6, 0.7 | 0.45 |
subsample | 0.7, 0.8, 0.9 | 0.8 |
scale_pos_weight | 3, 4, 5 | 4 |
The remaining parameters are default values in the scikit-learn library |
Feature selection and data preprocessing
Among the features considered for modelling, we carefully evaluated the data availability and data feasibility for clinical applications. Prior studies have shown that BMI [
13], age [
14], and mass extent [
15] are critical indicators for NLM recurrence. Our previous studies [
16,
17] found that elevated BMI, abortion, intraoperative discharge and the presence of an inverted nipple are high-risk factors for the onset of NLM. Furthermore, markers such as WBC, NLR, and AGR are included in the model due to their utility in predicting inflammatory diseases. Considering the coexistence of obesity and hyperlipidemia, lipid-related features including TG level are also incorporated into the model. Additionally, we excluded features with missing values exceeding 10% in the dataset.
Above all, based on discussions with clinical experts and our treatment experience, ten features were selected in this study to build the ML model. Parameters used in our ML model are divided into preoperative clinical case information, preoperative laboratory tests and intraoperative findings: (a) age, BMI, number of abortions, presence of inverted nipples, extent of breast mass; (b) WBC, NLR, AGR and TG; (c) presence of intraoperative discharge.
Definition of outcome indicators
Recurrence included the reappearance of redness, swelling, heat, pain, pus formation, ulceration or ultrasound-visible lesions in the original lesion within six months of follow-up after comprehensive treatment, as well as new eruptions in the ipsilateral (outside the area of the original lesion) or contralateral breast in the follow-up period.
The evaluation of ML models
We evaluated the performance of different ML models on the training set, and validation set. The model performance was evaluated based on Accuracy, Precision, Recall, F1-score, and AUC [
18]. The closer the AUC value is to 1, the better the model performance is.
Discussion
In this study, we explored ten features, all of which were routinely assessed by preoperative clinical case information, preoperative laboratory tests and intraoperative findings. These above data types are easy to access clinically. That’s why we chose these features in our prediction model. The problems faced by clinical medicine are often very complex, and many clinical trials struggle to decouple disease-predisposing factors for the reason of human ethics. Based on the above reasons, some clinical features were considered in the selection of parameters during the construction of the models.
BMI stands for body mass index, which is widely used for measuring and diagnosing obesity. First, changes in local inflammatory factors in the breast due to obesity may be associated with the development of NLM. Obesity is an independent risk factor for the development of NLM [
20]. Besides, the Mendelian randomization study of inflammatory breast disease suggests that elevated BMI increases the incidence of inflammatory breast disease [
21]. Second, in terms of disease recurrence, the recurrence probability of patients with higher BMI is 1.8 times higher than that of patients with normal BMI [
13]. From the feature importance of our models, BMI is an important feature for NLM recurrence. The SHAP values show positivity of influence on the predictive probability, and the trends of predictive results are consistent with previous clinical studies. Therefore, it is reasonable to include BMI in our models.
The intraoperative discharge usually overflows in the form of lipoid or milky secretions. Our previous studies found that there was a positive correlation between intraoperative discharge and morbidity of NLM [
16]. In addition, the lipid signature of NLM changed significantly in breast tissue [
22]. From the two models’ feature importance, similarly, another promising finding is that intraoperative discharge may influent the recurrence of NLM. Moreover, the SHAP values show positivity of influence on the prediction results. Although previous studies have not examined the relationship between intraoperative discharge and disease recurrence, our results suggest that lipid abnormalities may be important drivers in the mechanisms of NLM pathogenesis and recurrence. That is to say, the intraoperative discharge is suitable as feature in our models.
There are also some important biomarkers: WBC, NLR and AGR. WBC is the number of white blood cells per unit volume of blood, which is the most easily obtained and easy to monitor in laboratory tests. It can sensitively respond to the inflammatory state of the body. In this study, WBC is positively correlated with recurrence, which indicates that the inflammatory state of the organism is active at the time of the patient’s recurrence. Higher WBC may be associated with larger breast mass and more severe disease conditions. Both the above clinical experience and the trend of WBC SHAP value in our models prove that WBC is a good feature for clinical observation and should be selected in our prediction recurrence model.
The NLR has been an emerging marker of disease, and a flag of immune system homeostasis. It plays an important role in the inflammatory response in various autoimmune diseases (Hashimoto’s thyroiditis, systemic lupus erythematosus, rheumatoid arthritis, etc.) [
23‐
25] and correlation with some disease activity (obesity, arteritis) [
26,
27]. This indicator has also been used as a prognostic indicator for predicting cancer [
28,
29]. In our previous study, we compared the inflammatory breast tissue of NLM with normal breast tissue by transcriptomic sequencing which revealed that NLM is associated with neutrophil chemotaxis and the formation of neutrophil extracellular traps. Besides, it is interesting to note that there was a special sub-type of NLM, which is called cystic neutrophilic granulomatous (CNGM) [
30]. The vesicle is surrounded by a circle of neutrophils, and then there are histiocytes, multinucleated giant cells, and lymphocytes surrounding the vesicle. That is to say, inflammation allows a large number of neutrophils to metastasize. Meanwhile, lymphocytes play a crucial role in the specific immunity of the organism. Lymphopenia suggests a reduced immune function of the organism. This may explain why the higher of the NLR often suggests a possible recurrence of the disease.
Likewise, AGR is an indicator of the combination of inflammation and nutritional status of the organism. The finding of a previous study has demonstrated that the application of a specific cut-off value for AGR can significantly enhance the predictive accuracy for disease recurrence [
31]. Further investigation is warranted to explore the predictive value of AGR in NLM recurrence, as no additional studies have examined this relationship.
Other worthy exploring features: number of abortions, inverted nipples, age, and mass extent. It is known to all that during childbirth or abortion, the hormone level changes, with estrogen, progesterone, and prolactin elevated. And the secretory function of glands transformed, which became the basis of disease development. the sudden interruption of pregnancy leads to a sharp drop of progesterone and a sharp rise of prolactin. It will lead to the occurrence or even recurrence of NLM. This analysis found evidence that repeated abortions will increase the risk of onset and recurrence of the disease.
A popular explanation of the persistence of inverted nipples makes the disease recurrent and prolonged. The squamous epithelium of the duct opening and duct sinus extends deeper. Its keratinized scales and irritated lipid-like discharge block the duct, and the rupture of the duct triggers the areolar abscess connected to the large duct. In this study, the inverted nipples were treated and corrected during the surgical procedure, which decreased the recurrence rate significantly.
For NLM patients with abscess formation, the recurrence rate is increasing with age [
26]. Meanwhile, the mass extent represents the extent of inflammation in the breast [
26]. Previous studies also confirmed the same conclusion that mass extent or the number of breast lumps is an independent influence factor for recurrence. The recurrence rate of patients with mass area ≥ 12.13 cm
2 is 1.414 times higher than others [
27].
The reasons why we selected several parameters (number of abortions, inverted nipples, age, mass extent) are as follows. Firstly, these features that we have observed in clinical practice, and the inclusion of these features in the model may help to establish the model. Secondly, our results showed that the inclusion of these features do not have an impact on the most important features of the model, such as BMI, WBC, and intraoperative discharge. Thirdly, the trends of SHAP values of all features in our models are consistent with the trends of these features’ clinical presentation. Therefore, the inclusion of these features (number of abortions, inverted nipples, age, mass extent) in the model is necessary to build practical prediction models for recurrence.
To sum up, our model is aims to identify the risk of NLM recurrence and explore the recurrence factors. Our results suggest that improving lifestyle (losing weight or low-fat diet) and avoiding abortion can decrease the recurrence rate; For clinicians it can guide the clinical treatment plan, including prolonging the duration of medication and correcting inverted nipples during surgery. At the same time, through our study, we have found a way to build prediction models practically. On the one hand, the impact of these features on NLM in our models is consistent with much clinical research literature and our previous studies. On the other hand, the prediction results of models also suggest that both the RF model and the XGBoost model have relatively good predictive performance. Therefore, we believe that this is an effective way to build a practical prediction model.
Strengths and limitations
Our study is to build ML models for the postoperative recurrence of NLM. Therefore, it can provide the ability to identify the risk of NLM recurrence and guidance in clinical treatment plans.
Due to the low prevalence of NLM (2–5%) in benign breast disease, only 258 patients were included in this study. The limitations of this research lie in its not very high accuracy and limited sample size. Working with small and imbalance datasets poses significant challenges. But based on the mathematical mechanism of our machine learning model, if we can collect and follow up on a substantial volume of patients over an extended duration, we will be able to improve model performance in future iterations. Nowadays, machine learning algorithms are mostly designed for big data analysis, which often leads to a lack of interpretability in predictive models based on these algorithms. For the reason, we have to be creative in how we use the model. We prefer using the predicted recurrence probability as a reference and combining the force plot and the forecast probability of an actual case, so that we can give advice and intervene to patients in advance.
Moreover, incorporating multimodal features, such as multiparametric magnetic resonance imaging (MRI), whole slide H&E images (WSIs), and gene sequencing data, is expected to further improve predictive accuracy. Using multimodal data is another solution for improving the prediction performance. Recent studies [
32‐
34] in this emerging field have shown accurate prediction ability has further improved. Future research should prioritize the incorporation of additional modalities to establish a comprehensive multimodal representation approach.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.