
Problem statement
A significant issue for deploying accurate artificial intelligence (AI) prediction tools in assisted reproduction is the lack of large electronic patient databases to provide sound predictions. Clinicians are trained to infer clinical meaning from incomplete data, but AI may not be able to perform appropriately under these conditions. Here, we test how much clinical data completeness affects the accuracy of the AI prediction of MII oocytes at ovum pick-up.
Methods
We curated a database of 440 controlled ovarian stimulation cycles, including IVF/ICSI cycles with patients as well as oocyte donors, collected in 2020-2022, with age, body mass index, antral follicle count, anti-müllerian hormone, MII, and gonadotropin dose as model variables. We trained two XGBoost AI models for MII oocyte prediction - one with higher variance (less robust) and one with lower variance (more robust). We assessed their efficiency with an R2 score while removing an increasing number of cycles. Further, we tested the predictions` R2 score after keeping all 440 cycles but randomly removing some of the values within each cycle.
Results
Compared to the entire dataset, the R2 of MII predictions decreased from 0.19 (8% less accurate) to 0.05 (73%) for the higher variance model and from 0.28 (1%) to 0.19 (35%) for the lower variance model when 10% and 80% of cycles were missing, (see Figure 1 for all values), reaching a significantly lower accuracy with 20% (p=0.03) and 50% (p=0.05) fewer cycles, respectively.
When all 440 IVF cycles were included, but some of the values within the cycle were missing, the R2 score decreased from 0.17 (16% less accurate) to -0.14 (166%) for the higher variance model and from 0.28 (1%) to 0.19 (33%) for the lower variance model with 10% and 80% of values missing, reaching a significantly lower accuracy with 40% (p=0.005) and 60% (p=0.007) missing values, respectively.
Conclusion
Clinics should avoid incorporating AI tools based on incomplete internal databases. Even a 10% decrease in key data completeness lowers prediction accuracy, with likely effects on patient safety and treatment outcome.
