Introduction
Alzheimer’s disease (AD) which is the most common cause of dementia in elders, poses a great threat to public health as the size and proportion of the population aged over 65 years continues to increase across the world [
1,
2]. AD is thought to have a chronic progressive course that can begin more than 20 years before a clinical diagnosis of dementia can be made [
3,
4]. Due to the lack of effective treatments available for AD, early prediction and prevention of AD in individuals with high AD risk has been proposed as a potentially feasible way to delay AD onset and progression [
5,
6]. The importance of AD risk prediction has been emphasized for the identification of individuals with high risk of cognitive decline who could benefit from preventive strategies [
7,
8].
Increasing numbers of studies have focused on accurate identification of individuals with elevated risk of cognitive decline for early diagnosis and possible intervention. Various risk models have been developed for this purpose. The common variables used in previous reported risk models including demographics, cognitive test scores, lifestyle and health-related variables [
9]. Besides, MRI markers, CSF markers and genetic variables were also used in model construction. Previous studies have demonstrated the predictive value of CSF and MRI biomarkers. The models with both MRI and CSF markers might provide predicted risk with higher accuracy. However, the predictive accuracy of many of the existing risk models is only moderate or even low [
9]. Furthermore, most of the existing risk models classify patients into different risk categories, and only a few studies on AD prediction at the individual patient level have been reported [
10,
11]. Prevention strategies are likely different in individuals with different risks of AD [
12]. Precise prediction of AD risk at the individual patient level is needed for employing appropriate prevention strategies. In addition, the new published 2018 NIA-AA research framework indicates that AD and Alzheimer’s pathological changes (without symptoms) are not regarded as separate entities but earlier and later phases of an “Alzheimer’s continuum” [
13]. This makes it necessary to construct risk models that predict risk along Alzheimer’s continuum, and not only for a formal diagnosis of Alzheimer Dementia.
In our study, we aimed to construct risk models that best predict incident prodromal stage of AD in cognitive normal individuals and incidence of AD dementia in patients with mild cognitive impairment. We used the model to predict the risks for individuals in each year following initial evaluation, including the estimated years until an individual will be diagnosed with AD. In addition, we aimed to construct a risk model to predict Alzheimer’s continuum (A + T ± N ±) in normal elders (A-T-N-) based on the ATN research framework that diagnoses AD with biomarker evidence of Aβ (A), pathological tau (T), and FDG PET evidence of neurodegeneration or neuronal injury (N).
Methods
ADNI dataset
Data used in the preparation for this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset (adni.loni.usc.edu). The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI is to test whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see
www.adni-info.org. The ADNI study was approved by the Institutional Review Board at each of the participating centers, and all participants provided written informed consent.
Participants
Detailed eligibility criteria of ADNI participants are described at
http://www.adni-info.org. Cognitive normal individuals and MCI patients from the ADNI database were included in our study if they were followed-up after 1 year, and subsequently. In brief, cognitive normal (CN) participants had normal cognitive performance (MMSE scores between 24 and 30, Clinical Dementia Rating of 0). Similarly, for participants without MCI or dementia at baseline. MCI patients had MMSE scores between 24 and 30, had objective memory loss measured by the education-adjusted cutoff on the Wechsler Memory Scale Logical Memory II, a CDR of 0.5, and were without dementia.
For Alzheimer’s continuum model construction, individuals from the ADNI were evaluated if they underwent amyloid PET or CSF Aβ analysis (A), CSF p-tau examination (T), and FDG PET (N) at baseline. A cut off value of 1.11 for the florbetapir standardized uptake value ratio (SUVr) and 192 pg/ml for CSF Aβ
42 were used to determine whether amyloid was abnormal (A +) or normal (A-) [
14]. A cutoff value of 23 pg/ml for CSF p-tau level was used to determine whether tau pathology was abnormal (T +) or normal (T-) [
14]. And FDG PET (N) (average of angular, temporal, and posterior cingulate) was determined by a cutoff point of 1.21 [
15]. to determine abnormal (N +) or normal (N-) neurodegenerative changes. Individuals with normal AD biomarkers (A-T-N-) were included in our study if they were followed-up after 1 year or more. The follow-up period of ATN group was the time between baseline and the final assessment of amyloid PET or CSF Aβ.
CSF and blood biomarkers measurements
CSF samples were collected at baseline by lumbar puncture. The levels of CSF Aβ, tau, and p-tau were measured by the multiplex xMAP Luminex platform (Luminex Corp., Austin, TX) with Innogenetics (INNOBIA AlzBio3; Ghent, Belgium; for research-use only reagents) immunoassay kit-based reagent. Plasma tau was analyzed with the Human Total Tau kit (research use only grade, Quanterix, Lexington, MA) on the Simoa HD-1 analyzer which uses a combination of monoclonal antibodies for a measure of total tau levels. Plasma NFL level was measured using an in-house ultrasensitive enzyme-linked immunosorbent assay on a single molecule array platform (Quanterix Corp). The assay uses a combination of monoclonal antibodies, and purified bovine NFL as a calibrator. All samples were measured in duplicate.
Neuroimaging measurements
The magnetic resonance imaging (MRI) measurement protocol in the ADNI dataset has been described in detail elsewhere [
16]. In brief, MRI was acquired at multiple sites using a GE Healthcare, Siemens Medical Solutions USA, or Philips Electronics system. Free-surfer software package version 4.3 and 5.1 image processing framework was used to process regional volumes for 1.5 and 3.0 T MRI images, respectively. All the MRI data were reviewed for quality control by the ADNI MRI quality center at the Mayo Clinic. Regional volumes were adjusted for estimated intracranial volume (ICV) (eMethods).
Amyloid PET imaging was measured with florbetapir. The 18F-florbetapir SUVr was calculated by averaging the 18F-florbetapir retention ratio from frontal, anterior cingulate, precuneus, and parietal cortex. The cerebellum was used as a reference region. FDG-PET data were acquired and reconstructed according to a standardized protocol (
http://adni.loni.ucla.edu/). Spatial normalization of each individual’s PET image to the standard template was conducted using SPM. For FDG-PET, we averaged counts of angular, temporal, and posterior cingulate regions.
APOE genotyping and polygenic hazard score computation
The ADNI samples were genotyped with the Omni 2.5 M BeadChip (Illumina, Inc, San Diego, CA) and basic QC was performed. APOE alleles were defined by rs7421 and re429358 which were genotyped by PCR amplification followed by HhaI restriction enzyme digestion and Metaphor Gel. We acquired a Polygenic Hazard Score (PHS) which was computed based on the combination of APOE and 31 other genetic variants from the ADNI database. Detailed information of the PHS can be found in a previous study [
17]. In brief, International Genomics of Alzheimer's Project Stage 1 data with genotyped or imputed data was used to identified AD-associated SNPs. Then a PHS score for each participant was provided by a Cox proportional hazard model using genotype data from from Alzheimer's Disease Genetics Consortium phase 1 (excluding individuals from the ADNI).
Statistical analyses
Both of the CN and MCI groups were separated randomly into discovery and validation cohort comprising 60% and 40% of the original participants respectively, to develop and validate the models. The discovery and validation cohorts of CN group included 292 and 195 samples, respectively. The discovery and validation cohorts of MCI group included 478 and 318 samples, respectively.
The Least Absolute Shrinkage and Selection Operator (LASSO) method was conducted to select significant predictors that influence the time to reach the endpoint during follow-up periods in the discovery cohort with the R package “glmnet”. Using time-to-event data, we conducted LASSO Cox regression for candidate baseline predictors selection and model construction. The possible variables included demographics (age, sex, years of education), risk gene (APOE ε4 status and PHS), health variables (body mass index [BMI], cholesterol level, systolic blood pressure [SBP]), medical history (history of diabetes, hypertension and depression), neuropsychological and functional tests (MMSE, Alzheimer’s Disease Assessment Scale with 11 items [ADAS11], Rey Auditory–Verbal Learning Test [RAVLT] immediate, Functional Assessment Questionnaire [FAQ], Logical Memory Delayed Recall [LM-DR]), neuroimaging markers (white matter hyperintensities [WMH], hippocampus volume, whole brain volume, entorhinal volume, middle temporal lobe volume, ventricles volume), CSF biomarkers (CSF Aβ, tau and p-tau), and blood biomarkers (plasma tau, and neurofilament light protein [NFL]).
Three models were constructed separately for CN and MCI groups. Model 1 were constructed with demographics, neuropsychological tests, health variables and medical history which can be easily available from primary care assessments. In the construction of Model 2, neuroimaging markers and APOE ε4 status were included as possible variables besides the easily available variables used in model 1. All possible predictors (APOE ε4 was excluded because it was included in PHS) were included as candidate variables for the construction of model 3 to reach high accuracy. The variables were selected by LASSO Cox regression in the discovery cohort of the CN or MCI groups by using the minimum criteria (minimized mean-squared error). The details of the predictors inclusion and selection of each model can be seen in Additional file
1: Figs. S1 and S2.
Simplified risk scores were developed using variables from CN and MCI models with continuous variables categorized into groups. Each variable was assigned a score corresponding to the coefficient from Cox regressions. Multivariate imputation by chained equations was applied to impute missing data with missing rates lower than 20% to reduce possible bias due to data incompleteness. Multivariate imputation by chained equations was applied with the R package “MICE”. Detailed information about missing data is provided in Additional file
1.
The endpoint event of MCI patients and CN participants was incident AD dementia and incident prodromal stage of AD, which was indicated by a CDR–global score of 0.5 or greater [
18]. The endpoint event of individuals with normal AD biomarkers (A-T-N-) was progression along the Alzheimer’s continuum (A + T ± N ±) including Alzheimer’s pathologic change (A + T-N-), Alzheimer’s disease (A + T + N ±) and Alzheimer’s and concomitant suspected non Alzheimer’s pathologic change (A + T-N +).
Predictive accuracy of the models was quantified by the area under the time-dependent receiver operating characteristic curve (AUC) using survival data [
19]. The cumulative/dynamic receiver operating characteristic (ROC) curves and area under curves were calculated with the R package “timeROC”. The Incident/dynamic AUC was calculated with the R package “RisksetROC”. P ≤ 0.05 were considered statistically significant in all analyses.
The predicted risk of incident prodromal stage of AD or AD dementia of each individual was estimated by risk models in the following steps:
(1)
Calculate the sum of “coefficient × value” for the individual. β
n is the regression coefficients in each model determined by LASSO Cox regression. Variable
n is the value of each predictor variable.
$$M\, = \,\left( {\beta_{1} \, \times \,variable_{1} } \right)\, + \,\left( { \, \beta_{2} \, \times \,variable_{2} } \right)\, + \, \cdots \, + \,\left( { \, \beta_{n} \, \times \,variable_{n} } \right)$$
$$I\, = \,\left( {\beta_{1} \, \times \,variable_{1} } \right)\, + \,\left( {\beta_{2} \, \times \,variable_{2} } \right)\, + \ldots + \,\left( {\beta_{n} \, \times \,variable_{n} } \right)$$
(2)
Calculate the sum of “coefficient × mean value” across individuals. β
n is the regression coefficients in each model determined by LASSO Cox regression. Mean variable
n is the mean value of the variable across individuals.
$$M\, = \,\left( {\beta_{1} \, \times \,mean\,variable_{1} } \right)\, + \,\left( { \, \beta_{2} \, \times \,mean\,variable_{2} } \right)\, + \cdots + \,\left( { \, \beta_{n} \, \times \,mean\,variable_{n} } \right)$$
(3)
The estimated AD risk at time t is calculated by the following equation, where survival(t) is the survival rate at time t derived from Cox regression models.
$$Risk_{(t)} \, = \,1\, - \,\left[ {survival_{(t)} } \right]^{exp(I\, - \,M)}$$
Discussion
In this study, we developed risk models using different classes of predictors for accurately predicting risk of progression of Alzheimer’s disease in CN participants and MCI patients at the individual level. In addition, we constructed a risk model for predicting the Alzheimer’s continuum in individuals with normal AD biomarkers, using the 2018 NIA-AA research framework.
Accurate prediction of individuals at high risk of cognitive decline or dementia is important for early intervention, such as monitoring and risk factor-targeted intervention. A model constructed with easily available and low-cost variables like demographics, health factors, cognitive and functional assessments and medical history can be widely used for screening AD risk in primary care settings [
21]. CN and MCI Model 1 were developed for this purpose. Both of the models had acceptable accuracy in both the discovery and validation cohorts. For the ease of use of the models, simplified risk scores were generated from CN and MCI models 1. Compared to previously published articles on AD prediction in primary care [
21‐
23], our models could provide information of estimated risk at each year after initial evaluation and the estimated time when an individual’s risk of convert from CN to prodromal stage of AD, or from MCI to AD dementia will increased to certain levels. This could be important in primary care because AD risk provided by the models is straightforward and easily understood by patients.
Poor accuracy has usually been associated with single-factor models in previous studies. Risk models with relatively high accuracy have incorporated multiple factors [
12]. Thus, the combination of various risk factors and biomarkers were included in the construction of CN and MCI Models 2 and 3. With all the possible predictors, we applied LASSO Cox regression to select optimal combinations of variables for model construction. Our studies showed that predictive power can be improved by adding Neuroimaging markers, CSF biomarkers and risk genes. The final models reached a high accuracy with AUCs of 0.81 in the prediction of incident prodromal stage of AD in CN participants, and 0.92 in the prediction of AD dementia in MCI patients, which can be considered as good and excellent models, respectively [
12]. We also compute tests for comparing the AUCs between models. All the comparisons showed significantly differences except the AUCs between CN model 2 and CN model 3. The AUC of CN model 3 was not significantly higher that of CN model 2. The AUCs should be measured on the same subjects in the comparison of two models. The result might due to the limited number of participants with data of CSF biomarkers. We believed that more available data in the prediction might be optimal in clinical practice.
The NIA-AA research framework was published in 2018 defining AD biologically by neuropathological biomarkers [
13]. Multipredictor models that could predict the Alzheimer’s continuum have not been reported in the literature. In this study, an Alzheimer’s continuum model was constructed to predict Alzheimer’s continuum in individuals with normal AD biomarkers. Defining AD as a biological construct might enable a more accurate diagnosis that distinguishes AD from other diseases that could lead to dementia. As well as useful for future AD prediction, this model may also assist the recruitment of individuals with a high risk of AD into clinical trials. The model without CSF biomarkers would be more practical in clinical settings in the prediction of Alzheimer’s continuum. However, the predictive accuracy of the final model without CSF biomarkers as variables was relatively low. Models with only CSF biomarkers as variables were also constructed in both CN and MCI participants. The predictive accuracy of the models was also not high enough (AUC = 0.67 for CN, AUC = 0.79 for MCI). The prediction of prodromal AD and AD dementia with AD biomarkers only was unsatisfactory.
Prediction models were developed before with participants from ADNI using different variables [
9]. Gomar et al. have examined the predictive value of different classes of markers including clinical, cognitive, MRI, PET-FDG, and CSF markers in the progression from MCI to AD [
24]. They have found that cognitive markers were better predictors than biomarkers. Lehallier et al. also tried to predicted AD in MCI patients using 224 candidate variables [
25]. The results of their study suggest that a combination of markers measured in plasma and CSF was useful in predicting of AD dementia. However, none of the reported models using samples from ADNI could provide risk at the individual level. We think individual risk might be more important for patients. In our study, we developed and validated risk models for predicting AD onset at individual level with relatively high accuracy. Risk models that predict individualized risk of progression to dementia have been reported before [
10,
11]. Van Maurik et al. have constructed prognostic models for CN and MCI patients based on MRI measures and CSF biomarkers. However, CSF p-tau, which is more important for AD, was not included in their study. Besides, their models for MCI patients can only provide 3- and 1-year progression risks which is relatively short. In our study, we included CSF Aβ, tau and p-tau as candidate variables in model construction. All the candidate variables were selected by LASSO regression. Besides, the risk of prodromal AD or AD dementia at each after evaluation can be predicted with moderate to high accuracy within 5 years.
We also compared our models with existing risk models. A CN model with age, gender, MMSE, CSF Aβ and CSF tau as predictors to predict MCI or AD dementia has been reported by Van Maurik et al. with high accuracy [
11]. The Harrell’s C-statistic of the model was 0.82 in their original cohort. The AUC of their model was 0.61 when we validate their model using our participants. The predictive performance of their model was adequate in their own cohort. But the accuracy was low in validation. A MCI model including neuroimaging markers and CSF biomarkers as predictors to predict dementia in MCI patients was constructed in a precious study [
20]. Their model showed moderate predictive value in their cohort (Harrell’s C = 0.74) and in our participants (AUC = 0.70). However, the predictive value of their model was relatively low considering the predictors of neuroimaging markers and CSF biomarkers they used in the model construction.
One strength of this study is that individual risks of incident prodromal stage of AD or AD dementia at each year after evaluation, and estimated time when an individual’s risk of convert from CN to incident prodromal stage of AD, or from MCI to AD dementia will increased to certain levels, can be estimated by the models. This might be important for future treatment because intervention strategies for individuals with different risk profiles is likely to be different. With the development of AD prevention strategies, detailed information of future AD risk is necessary for precision prevention. Besides, future decline rate of cognitive function measured by MMSE can also be estimated by the models.
External validation is very important for risk models [
26]. The model’s predictions might not be replicable if the model was overfitted. Only a handful of existing model have been externally validated with acceptable accuracy including cardiovascular risk factors, aging, and dementia (CAIDE) models, specifically the Australian National University AD Risk Index (ANU-ADRI) [
21,
27‐
30]. Although participants in this study were all from the ADNI dataset, we separated the entire sample into two cohorts: a discovery cohort and a validation cohort. The models were constructed and validated in the two separate cohorts. The removal of unnecessary variables by LASSO regression also avoided model overfitting.
There are some potential limitations in this study. First, the number of participants is limited, especially those with available data for CSF biomarkers. The diagnosis of AD was recommended by the International Working Group to be restricted to people who have both positive biomarkers and specific AD phenotypes in the newly published article which highlight the importance of biomarkers in the clinical diagnosis of AD [
31]. The endpoint used in the construction of CN and MCI models was only clinical due to the small number of individuals with available data for CSF biomarker. The ATN group was not separated into two cohorts for the same reason. Second, the follow-up period was relatively short. The models could only provide predicted risk within 5 years with relatively high accuracy. Further studies with longer follow-up periods may enable long-term predictions. Third, the removal of unnecessary variables was performed by LASSO regression. However, LASSO regression will select one variable from two or more highly collinear variables randomly in the removal of variables. Some important predictors might be removed from the models. Finally, validation of the models with independent cohort is still necessary to test the models’ capabilities and applicability. Future replication studies using independent cohort might be necessary to validate the predictive ability of the models. In future studies, we will perform external validation using our own cohort. More analyses would also be performed to optimize the model with the samples from our cohort.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.