Flagging Of Acute Head CT Interpretations Using Bag-Of-Words as a Natural Language Processing Model

Yiftah Barash 1 Tal Levy 3 Shelly Soffer 4 Gennadiy Guralnik 5 Orit Shimon 4 Evgeni Druskin 1 Chen Hoffman 1 Eyal Zimlichman 2 Eli Konen 1 Eyal Klang 1
1Department of Radiology, The Chaim Sheba Medical Center
2Hospital Management, The Chaim Sheba Medical Center
3Department of Electrical Engineering, Tel Aviv University
4Sackler Faculty of Medicine, Tel Aviv University
5NY program, Tel Aviv University

PURPOSE:
Alerting of acute findings in CT interpretations is a health-provider quality measure. In this study we present a natural language processing (NLP) that is based on a bag of words (BoW) algorithm for identifying head CT interpretations with acute findings.

METHODS: Institutional review board (IRB) approval was granted for this study. Informed consent was waived by the IRB committee.

A bag-of-words model, or BoW for short, is a way of extracting features from text for use in modeling, such as with machine learning algorithms and here is used in natural language processing (NLP) model. In this model, a text is represented as the "bag" of its words. The bag-of-words model has also been used for computer vision.

The bag-of-words model is commonly used in methods of text classification where the frequency of occurrence of each word is used as a feature for training a classifier.

For this study, we collected consecutive interpretations of head CT scans which were performed in our emergency room (ER) during a time frame of 40 days from January 1st 2017. CT interpretations were scanned to identify acute findings which were defined as findings requiring treatment or follow-up (brain hemorrhage, acute infarct, space occupying lesion, face or skull fractures, sinusitis, new hydrocephalus).
Each interpretation was labeled as either acute or non-acute based on the presence of acute findings.

The BoW algorithm was written in Python 3.6. In our model, we used unigrams (n=1) and bigrams (n=2) as features (single word or two consecutive words). The dataset was represented as a matrix where each row refers to an interpretation and each column refers to a feature (unigrams or bigrams). After generating these matrices for both train and test sets, we were able to build our model and test it. For training we used the TF-IDF method that has the effect of highlighting words that are distinct (contain useful information) in a given document.

The algorithm performance was assessed using the accuracy metrics.

RESULTS: Overall we retrieved 1,578 head CT interpretations. The incidence of acute findings was 248 (15.7%). The algorithm showed an accuracy of 86% for identifying CT interpretations with acute findings.

CONCLUSION: The algorithm showed promising results in classifying head CT interpretations. Such a method can be used to flag interpretations with important findings and thus help to improve the quality of the ER treatment.









Powered by Eventact EMS