Background
Public and private regulators use risk adjustment models to prevent adverse selection, anticipate budgetary reserve needs, and offer care management services to high-risk individuals [
1]. Preventing risk selection by insurers is a critical ethical, legal, and societal goal that risk adjustment models can address. Risk adjustment models attempt to capture the relationship between demographic and clinical variables (risk adjusters) and subsequent healthcare utilization or spending. The models are commonly derived through standard linear regression methods or their extensions, and rely on individual-level data commonly captured in administrative claims datasets [
2]. All of the available models on the current commercial market are linear or log-linear regression models that leverage the same basic elements such as age, sex, diagnostic and procedure codes [
3].
Risk adjustment modeling may be improved by both methodological and conceptual advances in the risk modeling and healthcare services literature. From a methodological standpoint, newer machine learning methods have recently emerged as alternatives or complements to linear regression for predicting highly variable health outcomes using large sparse datasets, including estimating healthcare costs using claims data [
4,
5]. While traditional risk adjustment models are limited in modeling complexity and tend to underpredict expenditures of populations with very high expenditures [
6,
7], machine learning methods may help to capture complex non-linear relationships and interaction terms among variables, which could explain why some individuals with complex constellations of risk factors and diagnoses experience substantially higher cost than predicted. For example, among people with low income and diabetes receiving insulin, food insecurity is associated with hypoglycemia and emergency room visits during the last week of each month (after income from a first-of-the-month paycheck is deprived) and hypoglycemic medications are still being taken [
8]. These complex relationships are hard to model in standard risk equations, but can be potentially better captured by interactions-focused, nonlinear machine learning algorithms. Despite the promise of machine learning for risk adjustment, machine learning techniques have not yet been widely adopted for risk adjustment. This is partially because the machine learning models developed to date have not yet demonstrated superior predictive performance over traditional linear models on large datasets with more than a million enrollees [
2].
From a conceptual standpoint, risk adjustment may also be improved by including additional area-level indicators of social determinants of health (SDH), such as poverty, unemployment, and education, which contribute to risk, utilization and cost [
9‐
11]. Since before the UK Black Report and the Health Divide, epidemiologists have shown that while cultural and individual behavioral choices influence health, living conditions including the availability of resources (e.g., clean air and water), working conditions, and quality of food and housing have a particularly profound association with health outcomes [
12]. More recent initiatives to directly address these ‘social determinants’ of health include strategies to refer patients with food insecurity to food pantries, those that are homeless to direct housing resources, and those with challenges with transportation to assisted transport services, as a means to improve clinical outcomes such as nutrition-related chronic disease metrics (e.g., nutrition affecting blood pressure and diabetes glycemic control) and to improve the ability to access healthcare visits and reduce stress-related adverse health outcomes [
13].
The inclusion of SDH indicators into risk adjustment may particularly help plan payment estimation. SDH indicators may help capture previously unmeasured factors that could influence the course of disease, such as how poverty may affect chronic disease outcomes by affecting the ability to pay for medications or more nutritious foods, or how unemployment relates to mental health and associated course of disease related to depression and lower adherence [
14,
15]. Individual-level SDH factors are rarely assessed or included in commonly-available data, but area-level SDH indicators are readily assessed by national data sources [
16], and may be linked to the 5-digit ZIP code often available in claims data. Area-level SDH indicators were recently incorporated into risk adjustment models for the Massachusetts Medicaid program; their inclusion improved concurrent annual healthcare spending predictions for low-income adults [
17]. It remains unclear, however, to what extent incorporating area-level SDH indicators could improve prospective annual healthcare spending predictions, particularly for the privately-insured population who constitute the largest share of insured people in the US, but for whom SDH factors may be less visible or influential than for the Medicaid population.
The objective of this study was to assess whether prospective risk adjustment models may be improved by machine learning methods and by the incorporation of area-level SDH indicators in a national privately-insured adult population.
Results
Descriptive statistics on the data subsets are detailed in Table
2. The test set had a mean age of 41.1 years (median 41.0; IQR 30.0, 53.0) and was 48.9% female. Top-coding cost at $400,000 eliminated approximately 2.8% of dollars and test set members had a mean top-coded annual healthcare cost of $6677 (median 855; IQR 161, 3847). Around 17.7% of members in the test set had zero annual healthcare cost.
Table 2Characteristics of Members in the Dataset Subsets
Members Total, No. | 1,058,479 | 117,616 |
Female Total, No. (%) | 517,364 (48.9%) | 57,469 (48.9%) |
Members from ZIP codes without measured SDH variablesa, No. (%) | 1074 (0.1%) | 115 (0.1%) |
Population statistics, mean [median] (SD) |
Age, y | 41.1 [41.0] (13.1) | 41.1 [41.0] (13.1) |
2017 Annual Cost, $ | 6946 [861] (28,240) | 6868 [855] (27,826) |
2017 Top-coded Annual Costb, $ | 6762 [861] (23,822) | 6677 [855] (23,536) |
Table
3 shows the test set performance of the prospective linear and machine learning models without and with the SDH indicators.
Table 3Performance Measures of the Prospective Linear and Machine Learning Models on the Test Set
R2 (95% CI)a |
Linear | 0.327 (0.300, 0.353) | 0.327 (0.300, 0.354) |
ML | 0.388 (0.357, 0.420) | 0.387 (0.357, 0.419) |
MAE (95% CI)b |
Linear | 6992 (6889, 7094) | 6991 (6889, 7094) |
ML | 6637 (6539, 6735) | 6634 (6536, 6732) |
C-statistic (95% CI)c |
Linear | 0.703 (0.701, 0.705) | 0.700 (0.699, 0.702) |
ML | 0.717 (0.715, 0.718) | 0.716 (0.714, 0.717) |
Linear regression without SDH indicators
The linear regression model without SDH indicators, when derived through ordinary least squares regression, had the largest standardized coefficients (indicating highest importance among covariates in the model) for age and sex indicators and diagnostic coding for birth complications and chronic kidney disease (see Supplementary Information Table S
3). The model had a R
2 of 0.327 (95% CI 0.300, 0.353), MAE of $6992 (95% CI 6889, 7094), and C-statistic of 0.703 (95% CI 0.701, 0.705). Linear models derived through LASSO had similar performance metrics but tended to favor diagnoses more than traditional least squares (see Supplementary Information Tables S
3 and S
5).
Linear regression with SDH indicators
The inclusion of SDH indicators in the linear regression model had no substantial effect on the overall performance metrics. The model had a R2 0.327 (95% CI 0.300, 0.354), MAE of $6991 (95% CI 6889, 7094), and C-statistic of 0.700 (95% CI 0.699, 0.702).
Machine learning without SDH indicators
Switching from a linear regression model to the machine learning model significantly improved determination, significantly reduced error, and significantly improved discrimination. Specifically, the machine learning model without SDH indicators had a R
2 of 0.388 (95% CI 0.357, 0.420), MAE of $6637 (95% CI 6539, 6735), and C-statistic of 0.717 (95% CI 0.715, 0.718). The multilayer perceptron and random forest models outperformed the linear models but performed worse than the LightGBM model across all metrics (Supplementary Information Table S
5).
Machine learning with SDH indicators
The inclusion of SDH indicators in the machine learning model also had no substantial effect on the overall performance metrics above the machine learning model without SDH indicators. The model had a R
2 of 0.387 (95% CI 0.357, 0.419), MAE of $6634 (95% CI 6536, 6732), and C-statistic of 0.716 (95% CI 0.714, 0.717). We created variable importance rankings to assist in the interpretation of the machine learning model. Diagnosis predictors had the largest importance metrics in the machine learning model, with the most important predictors being chronic kidney disease, deficiency and other anemia, and other aftercare (see Supplementary Information Table S
4).
Subgroup analyses
Table
4 compares the predictive ratios and net compensation values for the machine learning model without and with SDH indicators. The addition of SDH indicators resolved or reduced underestimation of risk on all of the SDH-based subgroups, but the 95% confidence intervals were overlapping between the non-SDH and SDH-including models among all subgroups. On one of the high-poverty subgroups, the subgroup with a high proportion of non-fluent English speakers, the subgroup with a high prevalence of uninsured, and the subgroup of individuals who lived in areas with a large proportion of households on food stamps, the incorporation of SDH indicators resolved the underestimation of risk. Among subgroups of individuals who lived in areas with high poverty, high wealth inequality, and high prevalence of uninsured, the machine learning model trained with SDH indicators substantially reduced underestimation of cost among the subgroup, improving the predictive ratio by 3% (and net compensation by $200 per person) over the model trained without SDH indicators. The addition of SDH indicators led to small additional overpayment on the 4 subgroups for which the model without SDH indicators did not substantially underestimate risk (predictive ratio < 1.01), specifically one of the high-poverty subgroups, the subgroup with a large unemployed population, the subgroup with a low percentage of high school graduates, and the subgroup with a large number of single-parent families. Additional subgroup analyses among all models are presented in Supplementary Information Tables S
6,
7,
8.
Table 4Predictive Ratio and Net Compensation Values of Prospective Machine Learning Models on SDH-Based Subgroups in the Test Set
Total | 117,616 (100) | 6677 | 1.000 (0.976, 1.024) | 1.000 (0.976, 1.024) |
0 (− 105, 105) | 0 (− 105, 105) |
Poverty |
Median Income in the Past 12 Months, $ | 4923 (4.2) | 10,818 | 1.017 (0.915, 1.120) | 1.006 (0.905, 1.108) |
− 183 (− 836, 470) | −67 (− 729, 595) |
Families Under 0.5 Ratio of Income to Poverty Level in the Past 12 Months, % | 7932 (6.7) | 9344 | 0.966 (0.882, 1.050) | 0.948 (0.865, 1.031) |
331 (− 138, 801) | 510 (33, 987) |
Families Between 0.5 and 0.74 Ratio of Income to Poverty Level in the Past 12 Months, % | 6651 (5.7) | 8952 | 1.010 (0.912, 1.108) | 0.988 (0.892, 1.084) |
−89 (− 599, 420) | 109 (− 408, 627) |
Families Between 0.75 and 0.99 Ratio of Income to Poverty Level in the Past 12 Months, % | 7194 (6.1) | 9395 | 1.052 (0.956, 1.148) | 1.010 (0.919, 1.101) |
− 467 (− 977, 43) | −94 (− 613, 425) |
Families Received Food Stamps/Snap in the Past 12 months, % | 9009 (7.7) | 9001 | 1.028 (0.941, 1.115) | 0.996 (0.912, 1.079) |
− 247 (− 684, 191) | 39 (− 409, 487) |
Population Unemployed, % | 10,278 (8.7) | 7055 | 0.961 (0.886, 1.036) | 0.957 (0.882, 1.032) |
289 (−71, 649) | 316 (−51, 683) |
Gini Index of Income Inequality | 16,155 (13.7) | 6138 | 1.054 (0.985, 1.122) | 1.021 (0.955, 1.087) |
− 312 (− 578, −46) | − 126 (− 393, 140) |
Education |
Population Obtained High School Diploma, % | 9482 (8.1) | 7555 | 0.987 (0.900, 1.073) | 0.974 (0.889, 1.058) |
102 (− 324, 529) | 205 (− 227, 637) |
Population Obtained Bachelor’s Degree, % | 4169 (3.5) | 11,338 | 1.032 (0.923, 1.142) | 1.027 (0.917, 1.136) |
−353 (− 1139, 433) | − 294 (− 1080, 492) |
Other |
Population Speak English Less than “Very Well”, % | 23,659 (20.1) | 5453 | 1.023 (0.963, 1.083) | 0.989 (0.932, 1.046) |
− 124 (− 346, 98) | 61 (−161, 283) |
Families with Single Parent, % | 9097 (7.7) | 9880 | 0.993 (0.910, 1.076) | 0.978 (0.896, 1.060) |
65 (− 397, 527) | 224 (− 246, 693) |
Population Without Health Insurance Coverage, % | 13,656 (11.6) | 8333 | 1.066 (0.990, 1.142) | 0.990 (0.921, 1.059) |
− 516 (− 885, − 147) | 83 (− 287, 454) |
Additional results
Binned scatter plots of the prospective risk adjustment models on the test set are shown in Fig. S
1. We additionally explored the effect of using binary diagnosis predictors instead of counts (Supplementary Information Table
S9), the effect of top-coding cost (Supplementary Information Table S
10), the effect of including lab results (Supplementary Information Table S
11), and the development of concurrent risk adjustment models (Supplementary Information Table S
12).
Discussion
We observed that switching from a linear regression model to a gradient boosting ML model significantly improved determination and discrimination and reduced absolute error in cost. We also observed that the inclusion of SDH indicators at the ZIP code-level reduced underestimation of cost among people living in vulnerable areas.
Prior studies have separately investigated whether machine learning and the incorporation of SDH indicators can improve risk adjustment. The use of machine learning for prospective risk prediction in a previous study did not demonstrate substantial improvements over linear regression for a privately-insured population [
4]. However, the addition of SDH indicators has been shown to improve concurrent risk adjustment models, including Medicare Advantage Plan quality rankings, Medicare’s Hospital Readmissions Reduction Program penalties, and concurrent annual healthcare spending among a state Medicaid population [
17,
45,
46]. In our study, the incorporation of SDH indicators reduced cost underestimation in several vulnerable subgroups, even among a commercially-insured population. Improving predictions of cost within these subgroups is important in order to address persistent inequalities that lead to bias in the estimation of payment [
47‐
49].
Our study has important limitations. First, the risk models developed here are unlikely to generalize well to populations outside the U.S. as well as to Medicaid or Medicare populations for whom risk adjustment models may be particularly consequential to avoid adverse selection and maintain competitive and fair markets. However, the methods employed in this study could be used in developing specific models for those populations. Second, similar to other machine learning methods, the modeling approach used in this study is more complex than traditional linear regression. Although this may confer an advantage due to the potential of preventing ‘cheating’, in that machine learning models may be less susceptible to up-coding behaviors intended to inflate risk estimates [
2], the complexity might also contribute to difficulty to understand how and why the model made a certain decision [
29]. Third, since risk adjustment models are developed on historical data, they tend to perpetuate inequality of past spending trends if no explicit adjustments are made to account for the endogeneity of spending. Prior work has investigated methods to develop fairer healthcare payment models through data manipulation and modeling changes [
39,
41,
50], which can be pursued in future studies. Fourth, the SDH indicators used in this study are at the area-level which may lead to bias or ecological fallacy in the risk adjustment models. However, combining the claims data used in this work with individual-level socioeconomic status variables was prohibited for privacy reasons. Fifth, 5-digit ZIP codes are not as homogeneous as Census Tracts or Census Block Groups, which have been used in previous linear regression models assessing SDH-associated effects for Medicaid and Medicare populations [
51]. The risk for this study is a potential underestimation of the contribution of SDH to risk models. However, ZIP code is more readily available in commercial claims datasets. Sixth, there remains debate about whether adding in SDH indicators may allow for poorer healthcare to persist in healthcare organizations serving predominantly lower-income populations, by compensating them more in value-based payment models that adjust not only for outcomes but also for lower income for instance, although recent studies suggest this will not necessarily mask hospital quality [
52]. Seventh, one key challenge is to predict per-member utilization rather than cost. However, given that cost is a key concern for payers and often disproportionate to utilization due to negotiated contracts and geographic variations in cost, we modeled overall costs to help understand how much geographic parameters such as social determinants and machine learning could capture the complexities related to payment.
In the future, our ML approach may be improved upon in several ways. It may be possible to take advantage of the temporality of the data, for example by including more than one year of medical history. Additionally, it may be possible to train a hybrid (concurrent and prospective) model to leverage the continuous nature of medical enrollment, utilization, and claims [
53]. Finally, using highly parameterized models such as deep neural networks could better capture nonlinear interactions between covariates and scale to large claims datasets, at the expense of interpretability [
54]. We have shared our code in an open source manner to enable others to reproduce and extend our methods to other datasets.
Conclusion
The results of the current study suggest that machine learning methods and the inclusion of area-level SDH indicators may improve prospective risk adjustment models in a commercially insured population. The SDH indicators were particularly useful for populations living in vulnerable areas, while the machine learning approach had a greater impact on overall performance, leading to improvements in fit, discrimination, and overall cost allocation (>$3 M reduction in error per 10,000 people).
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.