Introduction

Type 2 diabetes is a global health problem posing substantial burdens on human health1. The diagnosis of type 2 diabetes is based on elevated blood glucose coupled with the absence of clinical features indicating alternative subtypes, such as type 1, monogenic, pancreatic or medication-induced diabetes2. A diagnosis of type 2 diabetes is generally the default or can be arrived at through exclusion of other types. Traditionally, most type 2 diabetes care guidelines have advocated treatment choice based on cost-effectiveness and side effects of specific medications, which have no relationship to underlying pathophysiology in the individual. More recent guidelines have suggested differential glucose-lowering therapies on the basis of higher body mass index (BMI) (favouring use of glucagon-like peptide analogue, GLP-1) or presence or absence of cardiovascular and/or renal disease and/or heart failure (favouring GLP-1 and/or sodium-glucose co-transporter 2, SGLT-2 inhibitors)3.

There is considerable heterogeneity in the clinical characteristics of patients with type 2 diabetes. Clinicians recognise that differences in degree of obesity or body fat distribution, age, dyslipidaemia or presence of metabolic syndrome can influence prognosis in diabetes and can be important considerations in treatment and management4,5,6. There is increasing awareness that type 2 diabetes heterogeneity may reflect differences in the underlying pathophysiology, environmental contributors, and the genetic risk of affected individuals. The mechanisms leading to the development of type 2 diabetes may differ from one individual to another and this could impact treatment and outcome.

Accurate characterisation of the heterogeneity in type 2 diabetes may help individualise care and improve outcomes. This goal has been realised in part for monogenic diabetes, where treatments can be tailored to genetic subtype to deliver precision care achieving better outcomes than standard care7. Given the complex pathophysiology and genetics of type 2 diabetes, applying precision medicine approaches is challenging. Critical to this endeavour is a better understanding of specific subtypes.

There are many studies of type 2 diabetes subtypes. The literature reflects diverse approaches based on the presence or absence of one or more simple clinical features or biomarkers and, more recently, sophisticated methods that deploy machine learning (ML) or use omics data. Classification approaches such as clustering methods to categorise this heterogeneity show inter-cluster differences in progression to complications or need for insulin treatment. These approaches consider clinical features at diagnosis8 or clinical information combined with genetic data to characterise disease heterogeneity9,10. Simpler approaches are more easily implemented across all resource settings, while complex approaches may have greater precision in classifying heterogeneity. The breadth and scope of the evidence in favour of type 2 diabetes subclassification have not to date been thoroughly examined.

The Precision Medicine in Diabetes Initiative (PMDI) was established in 2018 by the American Diabetes Association (ADA) in partnership with the European Association for the Study of Diabetes (EASD). The ADA/EASD PMDI includes global thought leaders in precision diabetes medicine who are working to address the burgeoning need for better diabetes prevention and care through precision medicine11. This Systematic Review is written with the ADA/EASD PMDI as part of a comprehensive evidence evaluation in support of the 2nd International Consensus Report on Precision Diabetes Medicine12.

In this systematic review for the PMDI we aimed to provide a critical assessment of the evidence to date for type 2 diabetes subclassification using (i) simple approaches based on categorisation of clinical features, biomarkers, imaging, or other parameters, and (ii) complex subclassification approaches that use ML incorporating clinical data and/or genomic data. We aimed to identify areas where further research is needed with the goal to improve patient and health system outcomes in type 2 diabetes care.

Our analysis shows that many simple approaches to subclassification have been tried but none have been replicated and most are not associated with meaningful clinical outcomes. However, a more complex stratification, using machine learning applied to clinical variables, yielded reproducible subtypes of type 2 diabetes that are associated with outcomes. Both approaches, however, require a higher grade of evidence but support the premise that type 2 diabetes can be subclassified into clinically meaningful subtypes.

Methods

This systematic review was written and conducted in accordance with our pre-established protocol (PROSPERO ID CRD42022310539) and reported using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Statement (PRISMA)13. We systematically reviewed papers to address two research questions devised by an expert working group: 1) What are the main subtypes of type 2 diabetes defined using simple clinical criteria and/or routinely available laboratory tests (simple approaches), and 2) What subphenotypes of type 2 diabetes can be reproducibly identified using ML and/or genomics approaches (complex approaches)? Subsequently, we refer to the first question as simple approaches and the second question as complex approaches. The quality of each paper was reported, and the aggregate of data evaluated using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) system14.

Study eligibility criteria

We included English-language original research studies of all design types that analysed populations with prevalent or new-onset type 2 diabetes and attempted in some way to stratify or subgroup patients with type 2 diabetes. We used broad terms to identify stratification studies and all approaches to stratification (the exposure) were included (supplementary table 1). We excluded studies examining risk for the development of type 2 diabetes, use of glycaemic control (e.g. HbA1c strata) alone to stratify, studies of stratification in types of diabetes other than type 2 diabetes, and review articles or case reports.

For simple approaches the exposure was defined as any of the following; a routine blood or urine biomarker that was widely available in most clinic settings; a blood or urine biomarker that might not be routinely available now but could have the potential to become easily accessible; any routinely available imaging modality; any physiological assessment that could be undertaken in an outpatient setting or results from routinely available dynamic tests. The stratification approach was either a cut-off or categorisation based on one or more of the above or if an index, ratio, trend or other analysis was undertaken, it could be calculated without complex mathematics. Finally, all outcomes were accepted for example clinical characterisation of subgroups, association with specific biomarkers and association with complications or mortality.

For complex approaches, the exposure used was defined as any of the inputs for the simple approach outlined above and/or any form of genetic data. However, unlike the simple approach, the stratification approach either deployed ML approaches or used other complex statistical approaches for stratification. All outcomes were accepted, as above, for simple.

Literature search and selection strategy

PUBMED and EMBASE databases were searched from inception to May 2022 for relevant articles using a strategy devised by expert health sciences librarians (supplementary methods). We undertook independent searches for each systematic review question. From both searches, each abstract and subsequently, full text paper, was screened by two independent team members for eligibility. In addition to the initial exclusion criteria, at the full-text review stage, we further excluded studies where exposures were not clearly defined and/or if the data on outcomes of the stratification were not available in results or supplementary material. We also excluded studies where the only stratification modality was a measure of glycaemic control, as this itself provides the diagnosis of type 2 diabetes. In cases of disagreement between two reviewers, a third reviewer made the final decision. The process involved group-based discussions to resolve disagreements to ensure all decisions were made on the same grounds.

Data extraction

Data were manually extracted from each full-text paper by individual team members and cross-checked by an independent team member at the data synthesis stage. We extracted relevant data on study design (observational or clinical trial), analysis design (cross-sectional or prospective), study population characteristics, stratification method and results (exposure), outcomes, and study quality assessment. For population characteristics, we extracted data on whether the type 2 diabetes population was new-onset or prevalent, the sample size, ethnicity and gender, the duration of diabetes (for cross-sectional analysis) and duration of follow-up (for longitudinal follow-up). For exposures, we extracted the approach to stratification and the number and nature of subgroups identified. For outcomes, we documented the type of outcome studied and the findings according to stratified subgroup.

Data synthesis

Following full-text data extraction, we undertook a qualitative analysis of exposures (measures used to stratify individuals) for each systematic review question. For simple sub-classification approaches, we extracted the details of stratification criteria in each paper (supplementary methods), then categorised the exposure as blood/urine test, imaging, age). After data extraction, these exposures were further refined into subcategories based on common emerging themes (e.g., use of pancreatic autoantibodies, BMI categories, measures of beta-cell function, use of lipid profiles). For complex approaches, the exposure included both the input clinical and/or genetic data used and the ML approach to analysis (e.g., k-means, hierarchical clustering, latent-class analysis), deployed. In both reviews, outcomes were heterogeneous, so we broadly categorised them where possible. Due to the variability in exposures and outcomes, it was not possible to undertake formal meta-analyses of any outcome. All coding, categorisation and thematic synthesis was undertaken and agreed upon by at least three members of the research team.

Quality assessment

The GRADE system was used to assess the quality of the studies extracted13. At least two members assessed whether study exposures and outcomes were clearly defined, valid and reliable, and whether confounders were appropriately accounted and adjusted for. Disagreements were resolved by discussion between the joint first and senior authors during group discussion. Assessors evaluated study limitations, consistency of results, imprecision, and reporting bias to assign study-specific and overall GRADE certainty ratings as very low, low, moderate and high15.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Search and screening for simple and complex systematic review questions

The first question examined simple stratification approaches using clinical variables that may reveal type 2 diabetes heterogeneity. A total of 6097 studies met the inclusion criteria and were screened (Fig. 1A). Of these, 183 studies were included for full text data review, of which 132 studies were subsequently excluded. The most common reasons for exclusion at the full-text review stage were studies conducted in populations without prevalent or incident type 2 diabetes, study designs that used ML approaches or stratification approaches that used HbA1c or diabetes medications. In total, 51 “simple approach” studies underwent full-text data extraction.

Fig. 1: PRISMA systematic review attrition diagram.
figure 1

A This shows the flow diagram for simple approaches to subclassification and B Complex approaches.

The second question aimed to identify papers with complex approaches, mostly ML-based strategies, to identify subgroups of patients with type 2 diabetes (Fig. 1B). A total of 6639 studies were screened, of which 106 were found eligible for full-text review. The most common reasons for exclusion were study populations not comprising participants with type 2 diabetes or classification approaches not using ML. In total, 62 ‘complex’ studies underwent full-text data extraction.

Use of simple approaches to subclassify type 2 diabetes

Description of extracted studies

The 51 studies using simple type 2 diabetes subclassification approaches incorporated 1,751,350 participants with prevalent or new-onset type 2 diabetes. Among them, 39% (20/51) of studies included participants of white European ancestry, 43% (22/51) incorporated exclusively participants from non-white European ancestries and 17% (9/51) included mixed ancestry groups (Supplementary Data 1). The majority of the studies (78%, 40/51) were conducted in populations with prevalent type 2 diabetes, and 22% (11/51) in new-onset type 2 diabetes. Approximately half the studies had a prospective design (25/51), the remaining half had a cross-sectional (26/51) design. For longitudinal studies, study follow-up periods ranged from <1 year to 22 years.

Studies included a wide range of exposures (Fig. 2) based on routine clinical measurements with standard cut-offs or groupings. These included assessment of individual routine clinic-based measurements (e.g., levels of BMI, or biomarker variability over time) or composite stratification incorporating two or more tiers of criteria (e.g. groupings combining one or more biomarkers or anthropometric measurements) including both routine and non-routine but clinically available tests, including oral glucose tolerance tests (OGTT) which, while a glycaemic test, also indirectly measures insulin resistance. The associations of stratified exposure characteristics were investigated with various outcomes: 1) measures of glycaemia, 2) clinical characteristics, 3) measures of diabetes progression such as time-to-insulin treatment or development of microvascular complications and 4) cardiovascular outcomes and/or mortality.

Fig. 2: Schematic overview of approaches used to subclassify type 2 diabetes.
figure 2

The figure summarises simple approaches that have been taken to subclassify type 2 diabetes and complex approaches. HbA1c glycated haemoglobin, BMI body mass index, GAD-65 glutamic acid decarboxylase-65 antibodies.

Description of categorised subgroups

Simple approaches to classification included use of lipid profiles (n = 8), BMI (n = 6), pancreatic beta-cell related measures (n = 6), pancreatic autoantibodies (n = 6), age at diagnosis (n = 2), OGTT data (n = 4), cardiovascular measures (n = 3), other biomarkers in urine or blood and alternative approaches (n = 5) (Table 1).

Table 1 Summary of published studies using simple approaches to type 2 diabetes classification.

Different categories of triglycerides, low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, atherogenic small dense lipoproteins with and without features of metabolic syndrome were used to stratify type 2 diabetes in eight studies. Cardiovascular disease (CVD) outcomes were assessed in 3/8 of the studies16,17,18 which showed that a more atherogenic metric of the specific lipid exposure (e.g., higher LDL cholesterol) was associated with a greater frequency of CVD outcomes. Other outcomes included pulse wave velocity19 or clinical characteristics; age, BMI, presence of metabolic syndrome in specific subgroups.

The six studies assessing pancreatic autoantibodies focused on glutamic acid decarboxylase 65 (GAD-65) levels. Studies used positive versus negative status or high versus low titre, and one study sub-stratified by age. Outcomes included time-to-insulin treatment20,21, associations with other clinical characteristics such as lipid profiles, BMI and blood pressure22,23,24 and measures of beta-cell function. There was no consistency in study design and most were observational with low to moderate evidence grade; two studies showed that GAD-65 positivity was associated with faster time-to-insulin treatment20,21.

Patients with type 2 diabetes were stratified according to their BMI in six studies, either by BMI alone (n = 5) or BMI in combination with HbA1c. The number of BMI categories varied between two and six in the identified studies. The association between BMI and glycaemic outcomes (change in HbA1c from baseline) was assessed in four studies either as primary or secondary outcomes6,25,26. We graded the quality of evidence as very low to moderate, and no consistency of effect was observed across all studies. In one secondary analysis of a randomised control trial, higher BMI at baseline was associated with faster progression to adverse renal outcomes, however, this was not replicated in any other study27.

Age at diagnosis was assessed as a stratification tool in two studies; younger age (mean age 33 years) was associated with higher rates of proliferative retinopathy in an observational study with 12 months follow-up versus older age (mean 50 years)4. In a second study, patients aged 60–75 versus those >75 years had a high risk of CVD and mortality when stratified by cholesterol levels6. Neither study was replicated to confirm findings.

Four studies used results from oral glucose tolerance tests (OGTT) as exposures. The specific stratification approach applied to OGTT profiles was different in each study and based on cut-offs of fasting glucose levels, glucose gradients after stimulation and responses to different drug treatments. Outcomes included clamp-derived insulin sensitivity and differences in the shape of glucose profiles between youths and adults28.

Measures of estimated beta-cell function were assessed in six studies including C-peptide levels and homoeostasis model assessment-2 indices for beta-cell function (HOMA2-B) or insulin resistance (HOMA2-IR), which require measurement of fasting insulin and glucose levels. C-peptide was defined using variable cut-offs. Outcomes included clinical phenotype data, response to medication, and microvascular or macrovascular complications. For example, hyperinsulinaemia and higher urine C-peptide were independently associated with cardiovascular disease.

Other exposure variables included less routine biomarkers, pulse wave velocity, ketosis/ketoacidosis and other disease indices, but these were each single studies precluding grouping. All data are summarised in Table 1.

Use of complex approaches to subclassify type 2 diabetes

Description of extracted studies

There were 62 studies of complex/ML approaches to type 2 diabetes subclassification in a total of 793,291 participants (Table 2). Over half of the studies included non-European ancestry in relevant proportions (>20%). Only ~30% (19 out of 62) of the studies analysed participants with new-onset diabetes. Mean diabetes duration ranged from recent onset (within 1 year) to over 36 years. Most data were from observational studies (46 out of 62), with some post-hoc analyses of clinical trials (10), survey data (4) and mixed study types (2). Half of the studies had prospective design (31 out of 62) with a mean follow-up duration ranging from 1 year to 11.6 years. K-means clustering was the most applied ML approach (30 out of 62). Eight studies used established centroids8 to assign participants to clusters. Two studies decomposed combinations of genetic variants and their association with clinical and laboratory phenotypes into genotype-phenotype clusters by using Bayesian non-negative matrix factorisation.

Table 2 Summary of published studies using complex approaches to type 2 diabetes classification.

Description of the categorised subgroups

Following the seminal work by Ahlqvist et al.8, multiple studies used the variables derived at time of diabetes diagnosis: age, HbA1c, BMI, HOMA2-B, HOMA2-IR and GAD-65 antibody (Table 2). The majority of these studies employed C-peptide-based homoeostasis model assessment indices (HOMA, or its updated variant, HOMA2, using fasting insulin and glucose), as surrogates for insulin resistance (HOMA2-IR) and insulin secretion (HOMA2-B). In different contexts and populations, 22 studies replicated identification of the four non-autoimmune diabetes subtypes first described by Ahlqvist et al.8: severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD), and mild age-related diabetes (MARD). The subset of studies including measurements of GAD antibody also identified the fifth cluster, severe autoimmune diabetes (SAID). Associations of these subtypes with clinical outcomes, including glycaemia, microvascular and macrovascular outcomes, and death, were replicated in 12 studies (Table 3).

Table 3 Association of the Ahlqvist-clusters with outcomes from 22 reviewed studies using consistent cluster assignment methods.

Thirteen additional papers used variations of the original set of variables from Ahlqvist et al.8 by substituting HOMA with C-peptide, adding lipid traits, e.g. HDL-cholesterol, or approximating the clusters from different/simplified variable sets by applying advanced statistical learning approaches such as self-normalising neural networks. These approaches identified some type 2 diabetes subgroups resembling the clusters from Ahlqvist et al. and also novel subgroups related to the additional variables (Fig. 3). Several of the novel subgroups were associated with clinical outcomes. However, these findings have not been replicated in other studies (Table 2).

Fig. 3: Main characteristics of diabetes clusters derived using a modified set of clustering variables, compared to original ‘Alhqvist’ clusters.
figure 3

Clustering variables denoted in blue are consistent across the different studies, those in black are unique to the particular study outlined. A greyed-out box indicates that the indicated diabetes cluster was replicated from the Ahlqvist study, a dark blue box indicates a new diabetes cluster. GAD, glutamic decarboxylase antibody; BMI, body mass index; HDL, high-density lipoprotein cholesterol; HOMA2-IR/B, homoeostasis model assessment-2 insulin resistance/beta cell function. SAID, severe autoimmune diabetes; SIDD, severe insulin-deficient diabetes; SIRD, severe insulin resistant diabetes; MOD, mild obesity-related diabetes; MARD, mild age-related diabetes.

Additional papers (n = 27) assessed various sets of phenotypic inputs for ML approaches. Grouped into five categories of inputs, studies identified many subtypes and associations with clinical outcomes, however, they all lacked replication (Table 2). Four papers applied complex ML methods to a set of less than ten clinical variables such as systolic blood pressure, waist circumference, BMI, fasting plasma glucose, and age at diabetes diagnosis, and resulting subgroups were variably associated with outcomes, such as mortality. Eleven studies used a larger set of more than ten clinical features as inputs for classification, including data from electronic health records29,30, and identified subgroups variably associated with clinical outcomes, including risk of cardiovascular disease. Two other studies specifically employed cardiovascular traits, including ECG31 and echocardiographic32 for ML algorithm inputs, and each identified subgroups with different associations with risk of cardiovascular disease. Finally, four studies involved inputs of change of glycaemic variables (HbA1c trajectories, glycaemia during a mixed meal test, continuous glucose monitoring features)33,34,35, one study focused on fasting GLP-1, GIP and ghrelin levels36, and two studies focused on behavioural traits such as novelty seeking, harm avoidance, and hospital anxiety and depression scale.

Human genetic risk information is rapidly penetrating clinical medicine. Two sets of papers utilised genomic data to identify diabetes subtypes, either in the form of inherited common genetic variation10,37 or gene expression data from muscle biopsies38 (Table 2). The first approach clustered genetic variants with clinical traits associated with type 2 diabetes to identify subsets of variants predicted to act in shared mechanistic processes. Using these sets of genetic variants, process-specific or partitioned polygenic scores were constructed in individuals with type 2 diabetes and were associated with differences in clinical features and prevalence of metabolic outcomes, with replication across multiple cohorts. The muscle gene expression study has not been replicated. Overall, half of the studies had cross-sectional designs, and the other half involved prospective follow-up (Table 2).

Quality assessment

For simple approaches, of the 51 studies assessed, 55% were quality graded as very low-, or low-GRADE certainty, 45% had moderate certainty and none achieved high certainty. For complex approaches, around 70% of the studies had moderate evidence certainty. In both approaches, the majority of the studies had moderate or lower GRADE certainty on account of the (1) study design not addressing precision medicine objectives (not an RCT testing differential treatment effects in subclassified type 2 diabetes groups), (2) lack of a meaningful clinical outcome (i.e. although subgroups of type 2 diabetes were found, the measured outcome had little clinical significance because the study was not designed to study this) (3) Confidence in the findings were low due to small sample sizes, lack of replication or lack of diversity of studied subgroups and (4) the potential for bias was large due to lack of adjustment for possible confounders.

Discussion

Summary of findings

This systematic review analysed two broad approaches to the subclassification of type 2 diabetes to identify clinically meaningful subtypes that may advance precision diagnostics. We found many simple stratification approaches using, for example, clinical features such as BMI, age at diagnosis, and lipid levels, but none had been replicated and many lacked associations with clinical outcomes. Complex stratification models using ML approaches with and without genetic data showed reproducible subtypes of type 2 diabetes associated with outcomes. Both approaches require a higher grade of evidence but support the premise that type 2 diabetes can be subclassified into clinically meaningful subtypes.

Simple approaches to subclassification included urine and blood biomarkers, anthropometric measures, clinical data such as age at diagnosis, surrogate beta-cell metrics derived from blood C-peptide or insulin along with other less diabetes-related biomarkers such as bilirubin levels or pulse wave velocity. Approaches to subclassification were diverse. Some studies dichotomised continuous variables based on clinical cut-points. Other studies used a composite exposure (two or more criteria each with cut-points) or analysed changes in continuous variables over time e.g. change in eGFR over time.

The study designs, specific cut-offs and outcomes were heterogenous, and no studies met high-quality GRADE certainty. No study evaluating a simple approach to type 2 diabetes subtyping has been adequately reproduced, although some studies identified biologically plausible subgroups. For example, subclassifications derived using BMI, beta-cell function, lipid profiles and age appeared to be associated with some outcomes which could be helpful in clinical practice. These potential subclassifications need to be replicated in better-designed studies (see section on additional supporting literature). Other evidence not included in our systematic review (either due to the study population including people without diabetes or the analysis was only performed in people with the exposure without a comparison group), support the role of simple variables in stratifying diabetes; for example, younger age at diagnosis is reproducibly associated with worse cardiorenal outcomes in a number of studies39.

Machine learning approaches yielded some reproducible subtypes of type 2 diabetes using a variety of clinical and genetic variables. The best-replicated subtypes were the clusters first described by Ahlqvist et al.8, which were replicated in 22 studies, including ~88,000 individuals of diverse ancestry. There also was replication of genetic subtypes of type 2 diabetes from Udler et al.10 with associations with clinical features seen in multiple cohorts across almost 454,000 individuals36. However, the latter associations involved small absolute effects with unclear clinical utility for individual patient management, and studies were restricted to individuals of European ancestry. While there was replication of the clusters from Ahlqvist et al. across studies, the generated clusters appeared to be dependent on the characteristics of the underlying populations, especially factors such as distribution of ancestry, age, duration of diabetes, anthropometric trait variability as in BMI, and the variety of variable terms included in learning models. Nevertheless, at least some of the resulting subtypes appeared to be robust to differences in specific ML method, input variables, and populations (Fig. 3).

Many of the input variables for the complex ML subtyping approaches were also used in studies involving simple approaches to subclassification, recapitulating the biological plausibility of specific clustering variables in defining type 2 diabetes subtypes. One study directly compared a simple clinical approach to the clustering approach from Ahlqvist et al.8 and found that simple single clinical measures analysed in a quantitative (rather than categorical) framework could better predict relevant clinical outcomes, such as incidence of chronic kidney disease and glycaemic response to medications40. Thus, further research is needed to determine whether assigning a patient to one of the clusters from Ahlqvist et al.8 offers additional clinical benefit beyond evaluation of simple clinical measures and also beyond current standard of care. For example, high quality randomised controlled trial evidence is needed to demonstrate that knowledge of a patient’s clinical or genetic cluster membership could meaningfully guide treatment and/or clinical care and improve outcomes.

Study quality

No studies included in our systematic review had above moderate certainty of evidence. Some strengths of included studies were the large sample sizes, the diversity of variables considered, and inclusion of both prevalent and new-onset cases of type 2 diabetes. However, the varied study designs and lack of replication limits our ability to draw firm conclusions about the most effective approaches to subclassification. Most variables used for subclassification capture momentary metabolic states, which limits their long-term utility as cluster assignment is likely to change over time41,42. Most studies were retrospective analyses of established cohorts, and there were, at the time of the search, no data available involving subtype-stratified clinical trials or real world implementation of approaches. Finally, most studies focused on European-ancestry populations, and the clinical value of these approaches may vary across different ancestries. While East Asian ancestries had representation in some studies, research in Black, South Asian and Hispanic populations remains sparse. This is particularly important, as four out of five people with type 2 diabetes come from marginalised groups or live in low- or middle-income countries. Future precision diagnostic interventions should address and narrow inequalities.

Additional supporting literature

Since our literature search was conducted, four new publications have advanced our understanding of type 2 diabetes subclassification.

Two recent studies applied ML approaches to stratify diabetes heterogeneity, both considering continuous approaches rather than with discrete clusters43,44. Nair et al. used a non-linear transformation and visualisation of nine variables onto a tree-like structure44 and with replication in two large datasets. This approach linked underlying disease heterogeneity to risk of complications; those at risk of cardiovascular disease had a different phenotype to those with microvascular complications and to drug response and demonstrated associations of gradients across the tree using genetic process-specific scores from Udler et al.10 Wesolowska-Andersen et al. performed soft-clustering from 32 clinical variables which yielded 4 diabetes archetypes comprising a third of the study population. The remaining study population was deemed as mixed-phenotype. This study has not been replicated43. A third study re-identified the genetic subtypes and their clinical associations from Udler et al.45.

Additionally, one of the first clinical trials to assess precision medicine approaches for diabetes management was published after our literature search. The TriMaster Study tested dichotomised BMI and eGFR strata in a three-period crossover trial using three pharmacologic interventions with the primary hypothesis being stratum-specific differences in HbA1c46. Participants with obesity (BMI > 30 kg/m2) showed a glycaemic benefit on pioglitazone versus sitagliptin and participants with lower eGFR (60–90 ml/min/1.73 m2) responded with lower HbA1c to sitagliptin as compared to canagliflozin. In a secondary analysis, drug-choice corresponding to patient preferences yielded lower glycemia than a random allocation, suggesting that listening to patients is critical in informing therapeutic decisions47. Ramifications of this study are limited by the non-comparable pharmacologic doses used, and the primary focus on glycaemia which may not be indicative of long-term therapeutic success and/or prevention of complications. Yet these studies have generated higher quality evidence linking type 2 diabetes heterogeneity to treatment and disease outcomes. It remains to be seen if these can be replicated in other ancestries and translated into ‘usable products’ for healthcare professionals.

It is worth noting that ketosis-prone type 2 diabetes, an established type 2 diabetes subtype, was not captured adequately in our systematic review: only one study included ketosis-prone type 2 diabetes as an exposure48. Study designs for ketosis-prone type 2 diabetes were usually analyses of cohorts with diabetic ketoacidosis at presentation with type 1 diabetes as the outcome, rather than as an exposure in people with type 2 diabetes. Since our search was designed to identify studies stratifying type 2 diabetes, this literature was not captured. Like many other ‘simple’ criteria for classification, the characteristics of people with diabetic ketoacidosis at presentation of type 2 diabetes have been studied, but with few prospective studies that have been replicated49.

Age at diagnosis as a simple approach to stratification also did not feature strongly in our search results. The body of literature that outlines higher risk of microvascular or macrovascular complications in early-onset type 2 diabetes has focussed on comparing people with type 2 diabetes to those without diabetes in different age groups39,50 or studied cohorts of early-onset cases in isolation51 and, thus, would not have been captured in our search strategy. Recent epidemiological studies have compared outcomes between early and late age onset strata52,53 showcasing higher risks of cardiorenal outcomes with early age at onset, but these were retrospective analyses of health record databases, potentially confounded by age-related risk of complications and duration of diabetes. To move forward, prospective studies stratifying different interventions (e.g., tighter treatment targets or better cardiovascular risk reduction) in those diagnosed at younger age, are needed.

Findings in context

We found that simple features have not been precisely and reproducibly evaluated to a high enough standard to subclassify type 2 diabetes into subtypes. This is not surprising, as many studies were not necessarily conducted for the purpose of ‘precision diagnosis’, but rather as studies of clinical phenotypes spanning a time period that preceded the current research focus on precision medicine. It is important to re-emphasise that many of the simple clinical criteria studied, do have other bodies of evidence supporting associations with outcomes, like age -at -diagnosis. While these studies have set the scene, the field needs more robust evidence.

‘Complex’ methods for diabetes subclassification have shown better reproducibility, have been linked to a variety of meaningful clinical outcomes more consistently, and more recently have been able to demonstrate differential treatment responses related to stratification.

What do these findings mean for a precision medicine approach to type 2 diabetes diagnosis? Ideally, subclassification strategies should be deployed at diagnosis of type 2 diabetes on the basis of measured clinical characteristics such that people in different subgroups of type 2 diabetes could be treated differently. One key question is whether such efforts would cost-effectively improve clinical outcomes, compared to the current standard of care. However, another more fundamental question is whether subclassification approaches at diagnosis alone are enough? For example, another approach may be to iteratively subclassify longitudinal disease trajectories. Such an approach is supported by studies that have shown cluster-based assignments of type 2 diabetes at diagnosis are not robust and may change over time54. It may be argued that subclassification at one-time point is overly simplistic and should be regularly reviewed based on trajectory.

Irrespective of the subclassification approach studied, they need replication in independent datasets, assessment in diverse populations, in people with both new-onset and prevalent diabetes, and investigation using prospective data, ideally in the form of randomised clinical trials. Clinical trials of treatment approaches tailored to diabetes subtypes will be necessary to understand the clinical benefits of clinical subtyping. Ideally, sub-phenotyping should lead to benefits for patients in real-world clinical settings. Conducting these studies will be challenging due to the necessity for extensive follow-up, large sample sizes, and substantial resource requirements. There is a pressing need for innovative strategies to generate high-quality evidence on treatment options tailored to specific diabetes subtypes in diverse populations. These data will be critical to determine generalisability of findings and amenability for clinical translation including in resource-constrained settings.

Clinical applicability

The current evidence supports distinguishable subtypes of type 2 diabetes and that these subtypes are associated with variation in clinical outcomes. However, the very low to moderate quality of existing studies and the need for replication in ancestry-diverse studies make it difficult to identify a strongly evidence-based, universally applicable approach.

The most clinically valuable methods are likely to be those that are easy and inexpensive to implement. For more complex approaches, computer decision support tools will need to be developed and assessed for feasibility and utility. Although the evidence supporting complex approaches has leap-frogged the evidence in favour of more simplified approaches, there is still likely a place for simple approaches that can be more accessible at diverse clinical interfaces. Meanwhile how cluster assignment could be translated into actionable data for the individual remains unclear; will for example, a given person with type 2 diabetes exist in a distinct subgroup with associated outcomes or will the subtype of type 2 diabetes have associated probabilities or risks of certain outcomes? While stratifying people with type 2 diabetes into discrete subtypes might result in information loss, compared to continuous risk modelling40, discrete clusters might inform clinical decisions42.

Limitations

The limitations of this review reflect the limitations of the literature. To manage the breadth of literature analysed in this systematic review, focussed on genomic data and did not include proteomic or metabolomic data as these are potentially more premature for clinical use. We also did not include studies on participants at risk of type 2 diabetes, although we recognise that a body of evidence is emerging to stratify type 2 diabetes incidence risk using multiple approaches that are similar to those for established type 2 diabetes. Since we focused on studies that attempted to subgroup type 2 diabetes, we also did not capture analyses of independent cohorts with a particular type 2 diabetes phenotype at baseline, for example, studies of young people with type 2 diabetes or those with ketosis-prone type 2 diabetes, as outlined.

Next steps and recommendations

Future research should aim to identify and validate clinically useful and cost-effective methods for type 2 diabetes subclassification that can be applied across diverse populations. Such research will involve replication of a given approach in independent datasets, including from diverse ancestral populations, to ensure generalisability that doesn’t widen health disparities. For simple stratification approaches, there is still much that can be done—agreement on standardised study designs for precision diagnostics studies could be a first step. For ML requiring real-time computation, the development of strategies to overcome local resource constraints in implementing these methods could be explored.

Conclusion

In this first systematic review of the evidence underpinning type 2 diabetes diagnostic subclassification, multiple approaches were identified. Among them are strategies that used simple criteria based on fundamental categorisation of mostly routine measures, and complex approaches with multi-trait or genetic inputs that required ML or other computation. While simple approaches are more easily deployed, the study designs and level of evidence currently limits any firm conclusions regarding the utility of such approaches. The clinical variables and data incorporated into ‘complex’ approaches have yielded reproducible subclassifications and a growing body of evidence supports clinically meaningful associations of subtypes with outcomes and treatment responses. This is a rapidly evolving field with higher quality evidence emerging. It will be crucial to develop interventions that target diverse populations and be feasible in all resource settings to prevent widening existing inequalities in the precision medicine era of diabetes care.