ILANIT 2023

Time-Dependent Iterative Imputation for Multivariate Longitudinal Clinical Data

Omer Noy Ron Shamir
Blavatnik School of Computer Science, Tel-Aviv University, Israel

Missing data is a major challenge in various domains. In clinical research, electronic medical records often have a large amount of missing values in laboratory tests and vital signs. The missingness can lead to biased estimates and limit our ability to draw conclusions from the data. Additionally, many machine learning algorithms can only be applied to complete datasets. A common practice to deal with this problem is data imputation, the process of filling-in the missing values with substituted values. However, some of the popular imputation approaches perform poorly on clinical data.
We developed a simple new approach, Time-Dependent Iterative (TDI) imputation, that offers a practical solution for imputing individualized time-series data. It addresses both multivariate and longitudinal data, by integrating forward-filling and Iterative Imputer, a version of the MICE algorithm. The integration employs a patient, variable, and observation-specific dynamic weighting strategy, based on the clinical patterns of the data, including missing rates and measurement frequency. We evaluated its performance by randomly masking values in clinical datasets, imputing them, and comparing the imputed values to the ground truth values. When applied to a cohort of 45,000 patients from MIMIC III, our approach outperformed state-of-the-art imputation methods for 14 out of 16 clinical variables, with an overall root-mean-squared-error of 13.83, compared to 16.18 for MissForest, the second best method. Similar results were achieved on three Israeli datasets of COVID-19 inpatients. Importantly, tests on these datasets also demonstrated that TDI imputation can lead to improved risk prediction.