
Abstract
Background and objectives
Cardiac surgery associated acute kidney injury can lead to increased morbidity, mortality, and hospitalization. The available risk assessment tools have limited predictive ability. Machine learning has been increasingly utilized to predict acute kidney injury in cardiac surgery patients in recent times due to its ability to handle complex clinical data. However, its predictive value remains uncertain. This study evaluates the predictive performance of machine learning models for acute kidney injury post-cardiac surgery.
Methods
A systematic review and meta-analysis was conducted by searching Web of Science, PubMed, Science Direct, Google Scholar, Scopus, and Cochrane Library up to 31st December 2025. PRISMA guidelines were followed. Included studies were assessed for machine learning model performance and acute kidney injury predictors, with effect measures including area under the receiver operator characteristic curve (AUC), sensitivity, and specificity. Pooled estimates were calculated using a random-effects model with 95% confidence intervals. Risk of bias was assessed using PROBAST. The meta-analysis in our study was performed using R version 4.5.0.
Results
The systematic search yielded 45 studies that met our inclusion criteria, encompassing 13 distinct model types, which include 81 models for training and 162 for validation. The overall pooled AUC was 0.83 (95% CI: 0.79–0.85) in the training and 0.76 (95% CI: 0.75–0.78) in the validation cohorts. Pooled sensitivity and specificity in the training dataset were 0.75 (95% CI: 0.71–0.79) and 0.81 (95% CI: 0.72–0.87), respectively. In the validation dataset, pooled sensitivity was 0.61 (95% CI: 0.53–0.69), while specificity was 0.82 (95% CI: 0.77–0.86). Analysis showed an overall 44.4% high risk of bias, particularly due to the analysis domain of PROBAST.
Conclusion
This study suggests that machine learning based models could potentially serve as a viable framework for predicting the risk of post-cardiac surgery AKI, however, highlighting the need for model optimization and validation in a diverse population before clinical implementation.