Preliminary validation study of an automatic speech measurement system after ENT cancer implemented on a tablet used in routine clinical practice - IALP 2025 Innovation and Inspiration in Communication Sciences and Disorders

Background ENT cancers acoustically impair speech and have a major functional impact on patients. In clinical practice, speech assessments rely mainly on perceptual measures known for their variability. The “Automatic Intelligibility Measurement System” (SAMI) is a new app based on machine learning to overpass this issue, validated in a laboratory context but not yet in routine clinical practice. SAMI calculates a measure of speech impairment from a reading task based on two French reference texts: La Chèvre de M. Seguin and Le Voyage d`Alice. The objective is to validate SAMI on clinical context.

Methods Twenty-five patients with ENT cancer (inclusions still in progress) were asked to read the texts implemented in SAMI, on a 6th generation iPad tablet. The task was performed over three times (T1, T2, T3), at least ten minutes apart, with two different microphone conditions (internal/external). A comparison between measurement times was performed to assess the temporal reliability of the scores. Scores between different measurement conditions (texts, microphones) were compared. Three experts perceptually assessed intelligibility and severity of speech impairment of T1 recordings with external microphone on both texts. Criterion validity was analyzed by comparing perceptual and automatic scores.

Results Preliminary results show that the scores obtained by SAMI are reliable (p>0.05). There was no significant difference between scores with internal and external microphones, except for time T1 with La Chèvre (p=0.038). In half of the recording conditions, there was a significant difference in scores between La Chèvre and Alice (p<0.05). Perceptual and automatic intelligibility measures were not significantly different (p>0.58), unlike severity, although the values were close.

Conclusion Preliminary results show that the SAMI application offers good temporal reliability. No preferential conditions of use have yet emerged (choice of text/microphone). SAMI scores are in line with the human reference for intelligibility, but there is a difference for severity, with SAMI underestimating the deficit. However, the difference remains small and has little clinical impact. These results are promising for the validation of the SAMI application in routine clinical practice but need to be further confirmed. The validation of SAMI in other languages is also ongoing.