Selection of the population
Out of the 2045 women who underwent clinical examination, 230 with missing information on metabolic syndrome components were excluded (based on inclusion criteria for a previous study [
23]), 67 because of unknown menopausal status, and 624 who were postmenopausal at the time of their mammogram.
Menopausal status was defined as follows: premenopausal if women menstruated at least once over the 12 months prior to recruitment, postmenopausal if women had (1) no menstruation over the last 12 months prior to the clinical examination or (2) surgical menopause (reported bilateral oophorectomy or reported unknown surgery) and were over 48 years old (mean age at menopause in Mexico [
24]).
Further selection was based on commonly used [
25] breast density categories (< 10%, 10 to < 25%, 25 to < 50%, ≥ 50%): women from each group of breast density were randomly selected proportionally to the size of the group, among non-users of oral contraceptive at blood donation. A total of 35 women were selected for the first group, 158 for the second, 247 for the third, and 160 for the last group. Among the 600 women whose samples were analyzed for targeted metabolomics, 1 woman was excluded because she had no biological sample, 7 women were excluded because they were older than 55 but declared to be premenopausal, and 19 women for whom measured BMI was not available at the time of clinical evaluation. Our final population included 573 women.
Metabolites were analyzed in samples from 599 participants. Values lower than the lower limit of quantification (LLOQ), as well as lower than batch-specific limit of detection (LOD) (for compounds measured with a semi-quantitative method: acylcarnitines, glycerophospholipids, sphingolipids), or higher than the upper limit of quantification (ULOQ), were considered out of the measurable range. Metabolites were excluded from the statistical analyses if more than 20% of observations were outside the measurable range (n = 11; 9 lower than LOD or LLOQ; 2 greater than ULOQ). A total of 132 metabolites (12 acylcarnitines, 21 amino acids, 7 biogenic amines, 77 glycerophospholipids, 14 sphingolipids and hexoses) were finally retained for statistical analyses. Of these 132 metabolites, 2 had values above the ULOQ (arginine (1.8%) and taurine (17.3%)) and were imputed with the ULOQ, and 9 had lower than LLOQ or LOD (≤ 9.0%) and were imputed with half the LLOQ or half the batch-specific LOD, respectively. The remaining 121 metabolites had all values included in the measurable range.
Percent of samples out of the measurable range and coefficients of variation for metabolites included in the analysis (median = 6.0%, interquartile range = 2.1%) are shown in Supplementary Table
1.
Covariate data
Data on dietary habits were collected through a 139-item food frequency questionnaire (details previously published [
28]). Information on frequency of consumption and portion size were used to calculate nutrients and energy intakes using the United States Department of Agriculture food-composition database and the Mexican National Health and Nutrition Survey database. Three dietary patterns were identified by principal component analysis (“Fruits and Vegetables,” “Western,” “Modern Mexican”) [
28]. Intakes and frequency were also used to estimate the Healthy Eating Index (HEI) 2015 total score [
29].
Insulin-like growth factor 1 (IGF-1), IGF binding protein 3 (IGFBP-3), C-peptide, C reactive protein (CRP), leptin, and adiponectin analyses were performed in the laboratory of the Biomarkers Group at IARC [
30]. Serum IGF-I, IGFBP3, and C-peptide concentrations were measured by immunoradiometric assays by Beckmann Coulter (Marseille, France) [
13]. Leptin was measured by a radioimmunoassay from Linco (Millipore, Billerica, MA, USA), while adiponectin and CRP were measured using an enzyme-linked immunoassay by R&D (R&D Systems, Europe, Lille, France) [
15].
Triglycerides, total and HDL cholesterol, and glucose were measured on fasting plasma blood samples at the Endocrinology and Metabolism Laboratory at the National Institute of Nutrition and Medical Sciences using standard assays. Glucose was measured via the automatized glucose oxidase method; triglycerides and HDLs were measured using enzymatic hydrolysis in an automatic analyzer with a tungsten lamp (Prestige 24i, Tokyo Boeki Medical System LTD). Number of metabolic syndrome components was defined according to the harmonized definition [
31] (waist circumference ≥ 88 cm, triglyceride levels ≥ 150 mg/dL, HDL cholesterol levels < 50 mg/dL, systolic blood pressure > 130 mmHg or diastolic blood pressure > 85 mmHg, and glucose levels ≥ 100 mg/dL) (details previously published [
23]).
Statistical analyses
Descriptive analyses were performed for selected characteristics of the population using mean and standard deviation (continuous variables) or frequency (categorical variables). Partial Pearson’s correlation coefficients, adjusted for age (where appropriate), state, and batch were computed for metabolites, measures of mammographic density, age, and BMI. Percent mammographic density was the primary outcome of this analysis, while dense area and non-dense area were examined as secondary outcomes after log-transformation to better approximate normality and homoscedasticity of the residuals. To account for analytical batch, residuals of log-transformed and standardized metabolites concentrations were obtained from linear models with random effect for analytical batches. These residuals were used as dependent variables in multiple linear regression testing associations with the different outcomes.
All models were adjusted for a priori selected breast cancer risk factors that included: age (continuous), BMI (continuous), age at menarche (< 12, 12, 13, ≥ 14 years, missing), family history of cancer (yes, no), history of benign breast disease (yes, no), use of oral contraceptive (ever, never), number of full-term pregnancies (0, 1, 2, 3, ≥ 4, missing), age at first full-term pregnancy (nulliparous, < 20, 20–25, 25–30, ≥ 30, missing), breastfeeding (nulliparous, no breastfeeding, < 6 months, 6–12 months, 12–24 months, ≥ 24 months, missing), alcohol intake (0, 0.1 drinks/day, 0.1–0.2 drinks/day, ≥ 0.2 drinks/day, missing), smoking status (never, past, current, missing), socioeconomic status (low, medium, high, missing), and physical activity (continuous). A missing category was created for all variables, except for physical activity where the only missing value was imputed to the median. Multiple tests were addressed using permutation
minP-adjustment of
P values to account for the dependencies between tests [
32].
For metabolites associated with percent mammographic density after correction for multiple testing, adjusted means of percent mammographic density were estimated by quartile of metabolite. For test of linear trend, participants were assigned the median value of exposure in each quartile and we modeled the corresponding variable as a continuous term. Analyses were further stratified by BMI (</> 27.4 kg/m2 (median)) and interaction with the dichotomized variable was tested for each metabolite by including an interaction term in the model. Adjusted means were examined by quartiles of metabolite in each group, and BMI (continuous) was included as an adjustment variable in each model.
To examine the robustness of the observed associations, additional exploratory analyses were conducted using a bootstrapped least absolute shrinkage and selection operator (LASSO) regression approach [
33,
34]: metabolic signatures of the percent mammographic density were obtained via simple cross-validated LASSO, which efficiently selects the most predictive variables in high-dimensional sets of potential predictors. This approach was then applied to 200 bootstrap samples to determine which metabolites were most frequently included in the signature.
To provide a better understanding of the metabolites associated with percent mammographic density, a variety of lifestyle, dietary, anthropometric, and metabolic factors already available in the study population were investigated in separate models in relation to plasma concentration of retained metabolites. Adjusted mean concentrations of metabolites of interest (residuals on analytical batch) were estimated across categories of each variable after excluding participants with missing values, adjusting for age and state. All variables previously listed as covariates in the main analyses were examined using similar categories or tertiles for variables initially included as continuous. In addition to these variables, we investigated waist circumference (tertiles), hip circumference (tertiles), high blood pressure (yes, no), circulating leptin, adiponectin, leptin/adiponectin ratio, IGF-1, IGFBP-3, C-peptide, CRP (tertiles of log-transformed concentration regressed on respective analytical batches), total cholesterol (tertiles), HDL cholesterol (tertiles), total cholesterol/HDL cholesterol ratio (tertiles), triglycerides (tertiles), glucose (tertiles), and number of criteria for determination of metabolic syndrome. The following nutritional factors were also examined (tertiles): total daily energy intake, protein, carbohydrate, starch, sugar, fibers, lipid, fatty acids (total, trans, saturated, monounsaturated, polyunsaturated) intakes (as residuals on total energy intake), glycemic index (GI) and glycemic load (GL), dietary patterns (“Fruits and Vegetables,” “Western,” “Modern Mexican”), and the HEI score. Heterogeneity of means across categories was assessed by F test from analyses of variance for all 46 variables, and P values were corrected for multiple tests with a Bonferroni correction (P < 0.001(0.05/46)). When significant heterogeneity was detected, linear trend across ordinal categories was further tested by assigning the median value of each category to participants and including the variable as a continuous term in a linear regression model.
All statistical tests were two-sided. Analyses were performed using SAS 9.4 (SAS Institute, Cary, NC) and R Studio (packages
NPC [
35] and
glmnet [
36]).