Incremental Value of Clinical Data Beyond Claims Data in Predicting 30-Day Outcomes After Heart Failure Hospitalization
Background— Administrative claims data are used routinely for risk adjustment and hospital profiling for heart failure outcomes. As clinical data become more readily available, the incremental value of adding clinical data to claims-based models of mortality and readmission is unclear.
Methods and Results— We linked heart failure hospitalizations from the Get With The Guidelines–Heart Failure registry with Medicare claims data for patients discharged between January 1, 2004, and December 31, 2006. We evaluated the performance of claims-only and claims-clinical regression models for 30-day mortality and readmission, and compared hospital rankings from both models. There were 25 766 patients from 308 hospitals in the mortality analysis, and 24 163 patients from 307 hospitals in the readmission analysis. The claims-clinical mortality model (area under the curve [AUC], 0.761; generalized R2=0.172) had better fit than the claims-only mortality model (AUC, 0.718; R2=0.113). The claims-only readmission model (AUC, 0.587; R2=0.025) and the claims-clinical readmission model (AUC, 0.599; R2=0.031) had similar performance. Among hospitals ranked as top or bottom performers by the claims-only mortality model, 12% were not ranked similarly by the claims-clinical model. For the claims-only readmission model, 3% of top or bottom performers were not ranked similarly by the claims-clinical model.
Conclusions— Adding clinical data to claims data for heart failure hospitalizations significantly improved prediction of mortality, and shifted mortality performance rankings for a substantial proportion of hospitals. Clinical data did not meaningfully improve the discrimination of the readmission model, and had little effect on performance rankings.
Short-term outcomes among patients hospitalized for heart failure are poor. Approximately 10% of these patients die within 30 days of admission,1,2 and almost one-quarter are readmitted within 30 days of discharge.1,3,4 In an effort to promote quality improvement and provide incentives to improve outcomes,5 the Centers for Medicare and Medicaid Services (CMS) began profiling hospital performance by publicly reporting hospital-level 30-day risk-adjusted mortality and readmission rates among patients hospitalized with heart failure. These performance measures have been adopted by accreditation organizations and payers alike, and rewards based on risk-adjusted outcomes have been proposed in health care reform legislation.6
Editorial see p 11
The risk-adjustment models currently used by CMS incorporate data exclusively from administrative claims.2,3 Although these models were validated against clinical data, they may benefit from the addition of clinical data. Clinical data, such as physiological data, laboratory results, and diagnostic test results, are unavailable in medical claims data but are frequently available in clinical registries or electronic medical records. In various patient populations, prediction of mortality based on administrative models can be considerably improved with the addition of clinical data.7,8 However, the value of adding clinical data to published claims-based models for patients hospitalized with heart failure remains unclear. Moreover, the benefit of adding clinical data to claims-based models for the prediction of readmission is not known.
Although clinical data may improve prediction of outcomes at the patient level, there is less information about how they would affect hospital profiling. If hospital rankings are unchanged by the addition of clinical data to claims-based risk-adjustment models, gathering these data may not be worthwhile. However, if hospital rankings change meaningfully, policies that reward or penalize performance based on outcomes may need to include clinical data to appropriately reward high-performing hospitals and penalize low-performing hospitals. Therefore, we linked data from a nationwide registry of heart failure hospitalizations to Medicare claims data to enhance existing claims-based models of short-term mortality and readmission. We examined the effects of adding clinical data to these models on both patient-level predictions and hospital-level rankings.
Risk-adjustment models currently used by the Centers for Medicare and Medicaid Services incorporate data exclusively from administrative claims. Results from these models are used for hospital profiling and public reporting efforts.
Given recent attention on electronic health records, it may soon be possible to incorporate clinical data into claims-only models if the incremental values of these data are warranted.
WHAT IS KNOWN
Adding clinical data to claims data for patients hospitalized with heart failure significantly improved prediction of mortality and shifted mortality rankings for 12% of hospitals classified as top or bottom performers.
Clinical data did not meaningfully improve the readmission model for patients hospitalized with heart failure and had little effect on hospitals' readmission performance rankings.
WHAT THE STUDY ADDS
Clinical data for this study were from the American Heart Association's Get With The Guidelines–Heart Failure (GWTG-HF) registry. GWTG-HF is a Web-based quality-improvement program implemented voluntarily by hospitals for patients with new or worsening heart failure according to case definitions similar to those used by the Joint Commission.9 The program succeeded the OPTIMIZE-HF registry, which has been described previously.10 Outcome Sciences, Inc, serves as the data collection and coordination center for GWTG, and the Duke Clinical Research Institute serves as the data analysis center and has an agreement to analyze aggregate deidentified data for research purposes. Only data from registry sites that routinely reported admission examination results and laboratory results were used for the study. All heart failure hospitalizations of patients 65 years or older with discharge dates between January 1, 2004, and December 31, 2006, were initially considered for the study. We excluded hospitalizations with a length of stay of 1 day or less and hospitalizations classified as elective, because these patients likely were not admitted for decompensated heart failure.2
We linked the remaining registry hospitalizations to the 100% sample of Medicare inpatient claims using admission date, discharge date, sex, and date of birth, a validated method that has been described previously.11 For these linked hospitalizations, we obtained all Medicare Part A, Medicare Part B, and denominator files from 2003 through 2007. Medicare Part A files contain claims generated by institutions for inpatient services. Medicare Part B files contain claims generated by institutions for outpatient services and by physicians for professional services. The denominator files contain information on beneficiary demographic characteristics and Medicare enrollment.
After the linking of the clinical and claims data, the study cohort contained data on patients enrolled in fee-for-service Medicare whose hospitalizations were identified in the claims data. We further limited the cohort to patients who had a full year of claims available before the registry admission, which allowed us to gather claims-based information about comorbid conditions. All patients in the cohort were included in the mortality analysis. We excluded patients from the readmission analysis if they were transferred to another hospital, left against medical advice, or did not survive to discharge.
For the mortality analysis, the primary outcome was death from any cause within 30 days of admission. We determined dates of death from the Medicare denominator files. For the readmission analysis, the primary outcome was readmission within 30 days of discharge. We determined readmission dates by using dates of subsequent hospitalizations recorded in the Medicare inpatient files. We did not consider hospitalizations for rehabilitation to be readmissions.
The initial model for each outcome, termed the claims-only model, was based on claims data alone. The variables included in the claims-only model differed by outcome and were based on published risk-adjustment models.2,3 Table 1 lists the history and comorbidity information included in the claims-only mortality and readmission models. Both claims-only models also included age and sex. Comorbid conditions and procedure history were based on International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis and procedure codes from the index hospitalization claim and from all other inpatient and outpatient claims in the previous 12 months. These codes were grouped into Hierarchical Condition Categories for use in the models. This diagnostic grouping method was used for the published heart failure models2,3 and is used for risk-adjustment by CMS.12
The extended model for each outcome, termed the claims-clinical model, included all variables in the claims-only model plus a subset of the clinical measures from the GWTG-HF registry. The candidate clinical measures considered for both the mortality and readmission claims-clinical models were ejection fraction, heart rate, hemoglobin, serum creatinine, serum sodium, systolic blood pressure, and weight. We selected these clinical measures for a few reasons. First, they were predictive of mortality and/or readmission among patients with heart failure in the chart-based validation models presented along with the published claims-based models.2,3 Most of these variables also have prognostic value for prediction of in-hospital mortality.13–15 Second, these measures are quantitative measures that are not captured in claims data but that should be readily available in either registries or electronic medical records. We were interested primarily in including quantitative clinical measures, because most of the dichotomous medical history variables in the published validation models already appear, in some form, in the claims-only model. Third, these measures were collected consistently in the GWTG-HF registry during the entire study period. Finally, although they are not a comprehensive list, these measures reflect multiple clinical domains, including laboratory test results, diagnostic test results, and vital signs. We coded each measure as a categorical variable for modeling, which allowed us to include a category for missing values and use all records in the analysis.
We did not perform any variable selection for the claims-only model; we included all variables used in the previously published models.2,3 Inclusion of clinical variables in the final claims-clinical model was based on a published variable selection method that results in a parsimonious model while not sacrificing predictive performance.16 We sought a parsimonious model on the assumption that a limited set of variables would be preferable were clinical data to become required for submission to CMS for risk adjustment. The selection method relies on results from models estimated on multiple bootstrap samples of the data. We chose to use 500 bootstrap samples. In each of these models, we included all of the claims-based variables while using backward selection on the candidate clinical variables. The criterion for retention of the clinical variables in the claims-clinical model was α=0.05. To be included in the final claims-clinical model, we required that a clinical variable be retained in at least 60% (300/500) of the models. For each of the 500 bootstrap samples, we calculated the area under the curve (AUC) and the generalized R2 statistic17 for both the claims-only model and the claims-only model plus each clinical measure individually to determine the potential contribution of each measure to the fit of the final model.
We initially fit the claims-only and claims-clinical models by using a generalized linear model with a logit link and binomial error distribution. We report the performance of each model on the basis of 3 criteria: discrimination, calibration, and explained variance. Discrimination is the ability of the model to discriminate between high-risk and low-risk patients. For this measure, we calculated the AUC. Calibration reflects the concordance between predicted probabilities and observed outcomes. For this measure, we calculated the observed and predicted event rates among patients in the lowest and highest deciles of predicted risk. Explained variance is the extent to which the variables included in the model account for the observed variance in the study population. For this measure, we calculated the generalized R2 statistic. Although it is not possible to formally test for differences in calibration and explained variance, we tested for differences in discrimination in 2 ways. First, we compared the AUC between the claims-only and claims-clinical models.18 Second, we calculated the integrated discrimination improvement (IDI) for the claims-clinical model compared with the claims-only model.19
We then refit the models as hierarchical generalized linear models with hospital-specific random intercepts. When exponentiated, these intercepts can be interpreted as a measure of performance, because they reflect the estimated predicted hospital-specific odds for each outcome while controlling for patient case mix. Hospitals with predicted effects that are significantly different from the average hospital (α<0.05) are outliers and are considered to have performance that is significantly better or worse than the average hospital. We report differences in hospitals determined to be outliers by each model. We also ranked hospitals by their predicted effects from each model and plotted the agreement between these rankings. To summarize the rankings, we grouped hospitals into 3 categories—top 20%, middle 60%, and bottom 20%—and again compared the results across models. We chose these categories because they reflect the 3 broad categories relevant to pay-for-performance programs, in which the top 20% of hospitals are eligible for bonus payments and the bottom 20% of hospitals may be subject to a payment penalty.6 As a sensitivity analysis, we repeated the analysis within subgroups defined by left ventricular function because of concerns about differences in outcomes based on preserved or reduced ejection fraction.
We used SAS version 9.2 (SAS Institute Inc, Cary, NC) for all analyses. The institutional review board of the Duke University Health System approved the study.
From 2004 through 2006, there were 344 GWTG-HF participating hospitals with 45 623 hospitalizations of patients 65 years or older. Of these hospitalizations, we linked 36 267 (80%) to Medicare claims. These hospitalizations represented 31 245 unique patients. Among the initial registry hospitalizations for these patients, 5.3% were elective admissions, 5.5% had a length of stay of 1 day or less, and 8.8% had less than 12 months of prior claims data available. After these exclusions, there were 25 766 patients from 308 hospitals eligible for the mortality analysis and 24 163 patients from 307 hospitals eligible for the readmission analysis. In both analyses, median age was 81 years and approximately 43% of the patients were men.
The observed 30-day mortality rate in the study population was 10.6% (2732 events). Table 2 shows the clinical measures that were significant in >60% of the bootstrap samples and thus were selected for inclusion in the claims-clinical model. Of the 7 clinical measures considered, only hemoglobin level did not enter the final model. Systolic blood pressure had the most impact on the discrimination and explained variance of the model.
Table 3 reports the performance of the claims-only and claims-clinical models for mortality among patients at all registry hospitals in the analysis. Both models performed well. Differences in discrimination, calibration, and explained variance were substantially improved with the addition of clinical data. The AUC was greater than 0.75 for the claims-clinical model, and explained variance improved by almost 50% over the claims-only model. Both models resulted in wide variance of predicted risk between the most extreme deciles, and both models exhibited good agreement between observed and predicted mortality rates across all deciles of predicted risk (Figure 1). The tests related to model discrimination were statistically significant (AUC χ2=159.1 [P<0.001]; IDI=0.042; z=21.7 [P<0.001]), indicating better fit for the claims-clinical model.
The observed 30-day readmission rate was 21.9% (5296 events). Table 2 shows the clinical measures selected for inclusion in the claims-clinical model. Compared with the mortality model, a smaller subset of the clinical variables was significantly related to readmission. Among the clinical measures included in the final model, none had a disproportionate impact on the fit of the model relative to the others.
Table 4 reports the performance of the readmission models among patients at all registry hospitals in the analysis. The addition of clinical data to the claims-only model did not meaningfully change any of the performance metrics. Although both tests related to model discrimination were statistically significant (AUC χ2=31.8 [P<0.001]; IDI=0.004; z=9.1 [P<0.001]), with values for the AUC at or below 0.60, neither model discriminated well between patients who were readmitted and those who were not. Explained variance was also low for both models. Also, although agreement between observed and predicted readmission rates across all risk deciles was good (Figure 1), there was less variance of predicted risk compared with the mortality model.
Figure 2 shows the hospital rankings based on both hierarchical mortality models. This plot is summarized in Table 5, which displays the number of hospitals in different ranking categories by model. Of the 308 hospitals in the mortality analysis, the median absolute change in rank position was 11 places (interquartile range, 4 to 21). Compared with the claims-clinical model rankings, the claims-only model incorrectly identified 7 of the 61 hospitals (11.5%) in the top quintile and 8 of the 61 hospitals (13.1%) in the bottom quintile. Figure 2 and Table 5 also show the hospital rankings based on both hierarchical readmission models. Of the 307 hospitals in the readmission analysis, the median absolute change in rank position was 4 places (interquartile, 2 to 7). Compared with the claims-clinical model rankings, the claims-only model incorrectly identified 2 of the 60 hospitals (3.3%) in the top quintile and 2 of the 61 hospitals (3.3%) in the bottom quintile. Thus, there was greater variability in ranks between the claims-only and claims-clinical models for mortality than for readmission.
There was considerable disagreement between the claims-only and claims-clinical models regarding which hospitals were top-performing outliers with respect to mortality. Of the 6 hospitals identified as top-performing outliers by the claims-only model, 3 were identified by the claims-clinical model and 1 additional hospital was newly identified. There was greater agreement about the bottom-performing hospitals. Eight of the 9 bottom-performing outliers according to the claims-only model were identified by the claims-clinical model; 2 additional hospitals were identified by the claims-clinical mortality model. For readmission, there was complete agreement between the models on which hospitals were the top- and bottom-performing outliers.
In a subgroup analysis by left ventricular function, we observed similar differences in performance between the claims-only and claims-clinical models as were observed in the overall study population (Table 3). However, for both outcomes, the models performed better for patients with left ventricular systolic dysfunction than for patients with preserved systolic function.
Among older Medicare beneficiaries hospitalized for heart failure, we found that clinical data improved the performance of the previously validated claims-only model of 30-day mortality. Although both models performed well according to traditional metrics of model fit—calibration, discrimination, and explained variance—predictions of mortality from the claims-clinical model demonstrated significant improvement over the model based on claims data alone. Differences between the models also translated into changes in hospital rankings. However, the improvement in risk-adjustment of 30-day readmission was more modest. Both the claims-only and claims-clinical models of readmission had relatively poor performance, and there were only small changes in hospital rankings between the models.
Our study is among the first to examine the value of claims data for predicting readmission among patients with heart failure or any other condition. In previous studies, the addition of basic clinical data, such as laboratory test results or admission examination results, to claims-based models of mortality improved prediction of mortality among various hospitalized populations, including patients hospitalized for heart failure, acute myocardial infarction, and stroke6 and among surgical patients8 in Pennsylvania. Our study was conducted on a national sample of patients in a heart failure registry linked with Medicare claims data, which enabled us to examine the benefit of adding clinical data to models almost identical to those used by CMS for risk adjustment.
The published CMS claims-based models originally were validated using data abstracted from medical records.2,3 For prediction of short-term readmission, the medical record model did not improve on the claims-based model. Our results for the readmission models are consistent with this finding. For prediction of short-term mortality, the performance of the medical record model was much better than the performance of the claims-based model. The AUC increased from 0.70 to 0.78, and the R2 increased from 0.09 to 0.22. The performance of our claims-clinical model for mortality (AUC, 0.76; R2 of 0.17) was improved over the performance of our claims-only model, though the improvement was not as striking. However, it is unclear whether the comorbid conditions included in the CMS medical record model were all preadmission comorbidities. For example, if the cardiac arrest measure included in that model, which had an estimated odds ratio of 21.3, reflected an in-hospital event, the performance of the model would be inflated. So, although the medical record validation model seems to indicate that there is room for improvement in our claims-clinical model, it is hard to know how much additional improvement is possible.
Although patient-level prediction is one function of risk adjustment, hospital-level prediction and profiling is another. We were interested in examining how clinical data influenced hospital-level performance rankings of risk-adjusted mortality and readmission. This question is critical for 2 reasons. First, CMS has begun to publicly report adjusted hospital-level outcome rates for Medicare beneficiaries with heart failure. Second, it is likely that CMS will soon begin using adjusted outcome rates to implement pay-for-performance programs that reward top performing hospitals while penalizing low performing hospitals. Accurate risk adjustment is critical for the credibility of these efforts. We found that hospital profiling for mortality was sensitive to the addition of clinical data. Approximately 12% of the hospitals classified as either top or bottom performers and >25% of the hospitals classified as outliers in the claims-only model were not identified as such in the claims-clinical model. We believe the improvements observed in patient-level prediction of mortality led directly to more accurate hospital profiling.
We also found that hospital profiling for readmission was insensitive to the addition of clinical data. Although the patient-level prediction of readmission based on the claims-clinical model was slightly improved compared with the claims-only model, this improvement did not affect hospital performance rankings in a meaningful way. There are 2 potentially competing explanations for this finding. First, both patient-level risk and hospital-level effects are already properly characterized using claims data alone, and additional data are not required. Second, patient-level risk is not adequately characterized, even with the addition of selected clinical data; therefore, estimated hospital-level effects improperly reflect systematic hospital-level differences in patient risk. On the basis of our results, it is not possible to know which explanation is correct.
We initially considered 7 clinical measures consistently collected within the GWTG-HF registry during the entire study period—ejection fraction, heart rate, hemoglobin level, serum creatinine level, serum sodium level, systolic blood pressure, and weight. We did not consider other potential clinical measures, such as blood urea nitrogen, brain natriuretic peptide, and troponin, because they were not uniformly collected during the study period. Although we considered 7 measures, we used variable selection methods with the expectation that a more parsimonious set of clinical predictors may emerge. In the final mortality model, however, all but 1 of the candidate measures were statistically significant in the final model and no single measure accounted for the preponderance of model improvement we observed. Therefore, it appears that consideration of more clinical measures is preferable to consideration of fewer.
The number of clinical measures submitted to payers, if eventually required, may not be an important factor, because clinical data will be required to be stored electronically in the next several years. Thus, the costs of data extraction and transfer should be relatively low once standards are developed. Promotion of computerized collection of clinical variables in electronic health records was an integral part of the Health Information Technology for Economic and Clinical Health (HITECH) Act. Standards for electronic health record certification under this act have been released by the US Department of Health and Human Services.20 These standards state that electronic health records should include, among other things, the ability to record vital signs and incorporate laboratory test results. These basic requirements cover most of the clinical variables we found to be important for risk adjustment in the heart failure population.
Our study has some limitations. First, not all hospitalizations from the registry could be linked to Medicare, primarily because of Medicare managed-care enrollment. In 2005, approximately 14% of Medicare beneficiaries were enrolled in a managed-care plan. However, the linkage rate we achieved for this registry is typical. Second, the study population was limited to patients aged 65 years or older with Medicare fee-for-service coverage. Therefore, we were not able to generalize the results to patients younger than 65 years and patients enrolled in Medicare managed care. However, these patients are not currently included in public reporting of mortality and readmission. Because most patients with heart failure are elderly and managed-care enrollment is modest, our findings should be applicable to most of the heart failure population. Third, the analysis was limited to hospitals in the GWTG-HF registry, so the findings may not be generalizable to all hospitals. Nevertheless, GWTG-HF hospitals range from small community hospitals to large tertiary and academic centers, so patients from many types of sites are represented. A previous study found similar characteristics and 30-day outcomes between the OPTIMIZE-HF registry and the general Medicare heart failure population.21 Finally, although the 7 clinical variables we considered for inclusion in our models are related to short-term heart failure outcomes, we did not have access to a comprehensive set of clinical variables, nor did we consider nonclinical variables. Other variables might have improved the prediction models.
Although administrative claims data are used routinely for risk adjustment and hospital profiling for short-term clinical outcomes of patients with heart failure, we found that the addition of clinical data provides incremental value. Adding clinical data to claims data significantly improved the prediction of mortality and shifted the mortality performance rankings for a substantial proportion of hospitals. Clinical data did not meaningfully improve the relatively poor discrimination provided by a claims-based readmission model and had little effect on hospital readmission performance rankings. Similar model performance differences between claims-only and claims-clinical models were observed in subgroups of patients with and without left ventricular systolic dysfunction.
Sources of Funding
This work was funded by an American Heart Association National Center grant.
Dr Curtis received research support from Allergan, GlaxoSmithKline, Johnson & Johnson, Merck & Co, and OSI Eyetech (>$10 000). Dr Fonarow received honoraria from Medtronic, Pfizer, and St Jude (<$10 000) and income for consulting from Novartis (>$10 000). Dr Hernandez received research support from Johnson & Johnson and Proventys (>$10 000) and honoraria from Amgen and Corthera (<$10 000). Drs Curtis and Hernandez have made available online detailed listings of financial disclosures (http://www.dcri.duke.edu/about-us/conflict-of-interest/).
Damon M. Seils, MA, Duke University, provided editorial assistance and prepared the manuscript; he did not receive compensation for his assistance apart from his employment at the institution where the study was conducted.
- Received March 19, 2010.
- Accepted November 6, 2010.
- © 2011 American Heart Association, Inc.
- Krumholz HM,
- Wang Y,
- Mattera JA,
- Wang Y,
- Han LF,
- Ingber MJ,
- Roman S,
- Normand SL
- Keenan PS,
- Normand SL,
- Lin Z,
- Drye EE,
- Bhat KR,
- Ross JS,
- Schuur JD,
- Stauffer BD,
- Bernheim SM,
- Epstein AJ,
- Wang Y,
- Herrin J,
- Chen J,
- Federer JJ,
- Mattera JA,
- Wang Y,
- Krumholz HM
- Ross JS,
- Chen J,
- Lin Z,
- Bueno H,
- Curtis JP,
- Keenan PS,
- Normand SL,
- Schreiner G,
- Spertus JA,
- Vidán MT,
- Wang Y,
- Wang Y,
- Krumholz HM
Patient Protection and Affordable Care Act of 2010. Publication No. 111–148, 124 Stat 119.
- Abraham WT,
- Fonarow GC,
- Albert NM,
- Stough WG,
- Gheorghiade M,
- Greenberg BH,
- O'Connor CM,
- Sun JL,
- Yancy CW,
- Young JB
- Fonarow GC,
- Adams KF Jr.,
- Abraham WT,
- Yancy CW,
- Boscardin WJ,
- ADHERE Scientific Advisory Committee, Study Group, and Investigators
- Peterson PN,
- Rumsfeld JS,
- Liang L,
- Albert NM,
- Hernandez AF,
- Peterson ED,
- Fonarow GC,
- Masoudi FA,
- American Heart Association Get With the Guidelines-Heart Failure Program
- Nagelkerke NJ
Office of the National Coordinator for Health Information Technology (ONC), Department of Health and Human Services. Health information technology: initial set of standards, implementation specifications, and certification criteria for electronic health record technology. Final rule. Fed Regist. 2010;75:44589–44654.
- Curtis LH,
- Greiner MA,
- Hammill BG,
- DiMartino LD,
- Shea AM,
- Hernandez AF,
- Fonarow GC