Background: Machine learning (ML) models are meant to process large amounts of data and reduce it to meaningful insights. The current study utilized ML to analyze the characteristics and outcomes of patients hospitalized with a primary diagnosis of heart failure at Sheba Medical Center (Ramat Gan, Israel) from 2008-2017.
Methods: All available patient data was collected from electronic medical records. Patients with an ejection fraction (EF) measurement were included. We compared ML models for prediction of readmission and mortality - random forests, logistic regression, support vector machines and neural networks.
Results: EF was measured in 3,838 patients during hospitalization for heart failure - 2,116 had preserved EF (≥ 50%), 237 had borderline EF (40-49%) and 1,485 had reduced EF (< 40%). We trained models with 182 variables to predict 30-day and 1-year mortality and readmission (shown in figure). Random forests had the highest area under the receiver operating curve (AUC). Reduced EF was an important predictor of all outcomes except 30-day mortality. Red blood cell distribution width (RDW) was unexpectedly an important predictor of all outcomes.
Conclusion: Comparing ML models commonly used in medical research, random forests had the best predictive ability of adverse events in heart failure. Although it is not feasible to use models with ~180 predictors in clinical practice, using all available patient data highlighted predictors that may not have been considered otherwise, such as the RDW.
Figure. Area Under the Curve for All Models and Outcomes