Artificial Intelligence for Identification of Radiation-Related Toxicities from the Electronic Health Records of Patients with Head and Neck Cancer

American Society for Radiation Oncology (ASTRO)

Date:

October 1, 2024

Authors:

F. Alfadli, J.M. Mathew, J.N. Waldron, J.Y. Kwan, S. Aviv, M.M. Nguyen, V. Mokriak, C. Pettengell, and P. Wong.

Download Publication

Share publication:

Abstract

Purpose/Objective(s)

Radiotherapy-related late toxicities (RLT) impact the quality of life of head and neck cancer (HNC) patients. Progression of RLT is often subtle and its recognition is dependent on multiple visits, recorded within unstructured electronic health records (EHR). This study details the process of tuning existing Artificial Intelligence (AI) platforms to identify RLT.

Materials/Methods

The validated AI platform, DARWENTM, pre-trained on general clinical data, was fine-tuned and validated using data from HNC patients referred to a RLT clinic. DARWENTM employed four models: relevance model (eliminates irrelevant sentences), subject detection model (determines RLT is related to the specific patient), assertion status model (determines presence of RLT), and query model (triangulates RLT, fine-tuned to adapt the engine to the concept of interest). A rulebook defining each toxicity (dysphagia, fibrosis, osteoradionecrosis (ORN) and trismus) and the toxicity status (ground truth) was provided by a radiation oncologist. Data was split into a training and an unseen cohort. Model queries were fine-tuned using the training cohort and run against an unseen cohort to determine toxicity status. Output was validated by comparing against manually curated ground truth. The ground truth was reviewed by a second trained reviewer and discrepancies in the manually curated data were adjudicated. Models were further fine-tuned after adjudication. Overall accuracy, precision (positive predictive value), and F1 scores (harmonic mean of sensitivity and positive predictive value) were generated.

Results

Patients (n = 207) were split into training (n = 167) and unseen (n = 40) cohorts. Prior to adjudication, DARWEN™ AI achieved overall accuracy of 53% (F1 = 0.66) for all toxicities. Precision of 42% for dysphagia (F1 = 0.33), 70% for fibrosis (F1 = 0.74), 86% for ORN (F1 = 0.88) and 53% for trismus (F1 = 0.50) was achieved. After adjudication and further fine-tuning, DARWEN™ AI achieved overall accuracy of 87% (F1 = 0.92) across all toxicities. Precision of 92% for dysphagia (F1 = 0.88), 100% for fibrosis (F1 = 0.93), 93% for ORN (F1 = 0.93) and 94% for trismus (F1 = 0.91) was achieved. Running refined models on unseen cohort (759 notes with >1 million characters) took a mean (SD) of 4.01 (0.42) seconds for each toxicity.

Conclusion

This study demonstrates the feasibility and accuracy of fine-tuning existing AI to find patients experiencing RLT from EHR. For future work, AI should be tested on EHR from a real-world HNC cohort, with or without RLT. For the purposes of this study, DARWEN™ was fine-tuned using the entire patient EHR; in the future, less documentation could be used for continuous RLT monitoring and early detection.

Author Disclosure: F. Alfadli: None. J.M. Mathew: None. J.N. Waldron: None. J.Y. Kwan: None. S. Aviv: None. M.M. Nguyen: None. V. Mokriak: None. C. Pettengell: None. P. Wong: Grant/research funding; AstraZeneca, Bristol Myers Squibb. Ownership equity; MISO chip.