ISRA May 2022

Using Artificial Intelligence for Detection of COVID-19 in Chest X-rays: Deployment and Preliminary Results During Israel’s 4th Pandemic Wave

Moran Manor 1,4 Dror Suhami 1,4 Yaron Caspi 2 Elisha Goldstein 3 Daniel Yaron 2 Daphna Keidar 5 Keren Peri-Hanania 2 Ronnie Rosen 2 Yael Rapson 1,4 Shlomit Tamir 1,4 Eli Atar 1,4 Gil N. Bachar 1,4 Yishai M. Elyada 6 Yonina C. Eldar 2 Ahuva Grubstein 1,4
1Radiology Department, Rabin Medical Center, Israel
2Department of Math and Computer Science, Weizmann Institute of Science, Israel
3Bioinformatics Unit, Life Sciences Core Facilities, Weizmann Institute of Science, Israel
4Sakler School of Medicine, Tel-Aviv University, Israel
5Department of Computer Science, ETH Zurich, Switzerland
6Mobileye Vision Technologies, Ltd., Hartom 13, Israel

Purpose: Testing the diagnostic performance of deep neural networks-based model to detect COVID-19 in a large dataset of chest X-rays (CXRs).

Methods: The model generates a score for each CXR. Images with a score of 0.5 or lower are classified as negative; images scored higher than 0.5 are classified as positive. The closer the score is to one of the edges (0 or 1), the stronger the confidence in the classification.

The software was integrated into a dedicated computer and was subsequently deployed anonymously on CXRs performed throughout August 2021 at the emergency department of Beilinson Medical Center. The diagnostic performance was determined by comparison with the PCR result during the week pre or post the image execution date.

Results: A total of three hundred and seventy images were analyzed. The mean age of the cohort was 65.6 years (± 17.4 SD); Forty-four percent were women. Fifty-four of total CXRs (14.6%) were PCR-positive. The sensitivity and specificity of the model were 76% and 60.7%, respectively. The AUC of the ROC curve was 0.7. Given the aforementioned prevalence, the PPV and NPV were 24.8% and 93.6%, respectively.

Possible explanations for false-positive results include segmentation failures (e.g., inclusion of pericardial or extra thoracic areas as lung parenchyma) and classification failures (non-COVID19-related opacities). False-negative results might be related to early disease stage. These issues are currently being investigated. Additionally, changing the score threshold might improve detection rate, although caution must be taken not to reduce specificity. We did not encounter any serious technical issues throughout the software activation.

Conclusions: These preliminary results are encouraging. Additional data collection and model improvement are currently taking place. We look forward to continuing our collaboration in order to fit the model into clinical use and participate in the worldwide challenge of reducing the pandemic burden.