Recent Popular Leaderboard What is KiKo? Case Reports

Assessment of OpenAI-based chatbot's Performance in Responding to Patient Consult Inquiries​

Need to claim your poster? Find the KiKo table at the conference and they'll help you get set up.

Presented at: ACRO Summit 2025

Date: 2025-03-12 00:00:00

Views: 15

Summary: Assessment of OpenAI-based chatbot's Performance in Responding to Patient Consult Inquiries. A total of 100 physician questions based on 10 diverse patient consults covering various cancer treatment sites were used. The questions were designed to represent realistic radiation oncology chart rounds. A chatbot system was developed using OpenAI's GPT-4 model to generate responses. A custom rubric system with a rating from 1 to 5 (1 = strongly disagree, 5 = strongly agree) was developed, which included several criteria: accuracy of content, completeness, context awareness, transparency of limitations, and lack of bias. Descriptive statistics were performed to evaluate the overall performance of the system. The overall performance of the chatbot achieved an average rubric score of 4.5 (± 0.9) out of 5 across all criteria. The results of descriptive statistics are as follows: the mean score (standard deviation) for accuracy of content is 4.4 (± 1.1), completeness is 4.4 (± 1.1), context awareness is 4.4 (± 1.0), transparency of limitations is 4.5 (± 0.9), and lack of bias 4.7 (± 0.5). The 'Lack of Bias' category received the highest average score and the lowest standard deviation, indicating a strong consensus on the system's ability to provide unbiased responses. In contrast, 'Accuracy of Content' and 'Completeness' had the highest standard deviations, suggesting more variability in the responses from OpenAI and identifying these areas as potential focal points for enhancement. Our research indicates that OpenAI's chatbot has the potential to be a useful tool in the future, specifically for retrieving patient information from consultations. However, despite the excitement around using AI in clinical settings, it is crucial to conduct a comprehensive evaluation before its implementation in clinical practice to fully understand the limitations of such systems. A rubric similar to ours can provide valuable insights into the strengths and weaknesses of these systems. Julian Jeberaeel, DO (Presenting Author) - Wayne State University: DMC, Karmanos Cancer Center; Ramesh Boggula, PhD (Co-Author) - Wayne State University/Karmanos Cancer Institute; Aria Kieft, MD (Co-Author) - Wayne State University: DMC, Karmanos Cancer Center; Steven R. Miller, MD (Co-Author) - Detroit Medical Center/Wayne State Unive