Flagging Of Acute Head CT Interpretations Using Bag-Of-Words as a Natural Language Processing Model

Yiftah Barash ¹ Tal Levy ³ Shelly Soffer ⁴ Gennadiy Guralnik ⁵ Orit Shimon ⁴ Evgeni Druskin ¹ Chen Hoffman ¹ Eyal Zimlichman ² Eli Konen ¹ Eyal Klang ¹

¹Department of Radiology, The Chaim Sheba Medical Center
²Hospital Management, The Chaim Sheba Medical Center
³Department of Electrical Engineering, Tel Aviv University
⁴Sackler Faculty of Medicine, Tel Aviv University
⁵NY program, Tel Aviv University

PURPOSE:
Alerting of acute findings in CT interpretations is a health-provider quality measure. In this study we present a natural language processing (NLP) that is based on a bag of words (BoW) algorithm for identifying head CT interpretations with acute findings.

METHODS: Institutional review board (IRB) approval was granted for this study. Informed consent was waived by the IRB committee.

A bag-of-words model, or BoW for short, is a way of extracting features from text for use in modeling, such as with machine learning algorithms and here is used in natural language processing (NLP) model. In this model, a text is represented as the "bag" of its words. The bag-of-words model has also been used for computer vision.

The bag-of-words model is commonly used in methods of text classification where the frequency of occurrence of each word is used as a feature for training a classifier.

For this study, we collected consecutive interpretations of head CT scans which were performed in our emergency room (ER) during a time frame of 40 days from January 1st 2017. CT interpretations were scanned to identify acute findings which were defined as findings requiring treatment or follow-up (brain hemorrhage, acute infarct, space occupying lesion, face or skull fractures, sinusitis, new hydrocephalus).
Each interpretation was labeled as either acute or non-acute based on the presence of acute findings.

The BoW algorithm was written in Python 3.6. In our model, we used unigrams (n=1) and bigrams (n=2) as features (single word or two consecutive words). The dataset was represented as a matrix where each row refers to an interpretation and each column refers to a feature (unigrams or bigrams). After generating these matrices for both train and test sets, we were able to build our model and test it. For training we used the TF-IDF method that has the effect of highlighting words that are distinct (contain useful information) in a given document.

The algorithm performance was assessed using the accuracy metrics.

RESULTS: Overall we retrieved 1,578 head CT interpretations. The incidence of acute findings was 248 (15.7%). The algorithm showed an accuracy of 86% for identifying CT interpretations with acute findings.

CONCLUSION: The algorithm showed promising results in classifying head CT interpretations. Such a method can be used to flag interpretations with important findings and thus help to improve the quality of the ER treatment.