Recent Popular Leaderboard What is KiKo? Case Reports

Efficacy of ChatGPT-4o in recognizing and diagnosing skin cancer subtypes in light-skin and skin-of-color based on high-quality clinical images

Need to claim your poster? Find the KiKo table at the conference and they'll help you get set up.

Presented at: Society for Investigative Dermatology 2025

Date: 2025-05-07 00:00:00

Views: 2

Summary: Abstract Body: Despite the rising popularity of Artificial Intelligence (AI) technology and chatbots such as ChatGPT in the medical community, the accuracy and intrinsic bias of such tools remains unclear. This study aimed to assess the diagnostic accuracy of ChatGPT for skin cancer represented in light-skin (LS) and skin-of-color (SOC). VisualDx, DermNet, and the American Academy of Dermatology website were used to collect clinical images of four skin cancer types: melanoma, squamous cell carcinoma (SCC), basal cell carcinoma (BCC), and Merkel cell carcinoma (MCC). 30 SOC and 30 LS images were obtained per cancer type; for MCC, only LS images were available. ChatGPT-4o was prompted to make a diagnosis based on each image. ChatGPT’s overall diagnostic accuracy was 0% (0/90) for Merkel cell carcinoma (MCC), 58.9% (106/180) for melanoma, 42.8% (77/180) for SCC, and 33.3% (60/180) for BCC. For melanoma, ChatGPT had a total diagnostic accuracy of 76.7% (23/30) for LS images after three attempts, which was higher than its accuracy rate of 46.7% (14/30) for SOC images (p = 0.033). Similarly, for BCC, ChatGPT had a total diagnostic accuracy of 56.7% (17/30) for LS images after three attempts, which was higher than its accuracy rate of 23.3% (7/30) for SOC images (p = 0.017). For SCC, ChatGPT had a total diagnostic accuracy of 46.7% (14/30) for LS images after three attempts, which was equivalent to its accuracy of 46.7% (14/30) for SOC images (p = 1.00). Overall, ChatGPT accurately diagnosed skin cancer on just 38.6% of its attempts, showing it was ineffective at diagnosing all types of cancer, regardless of skin color. ChatGPT was unable to diagnose MCC, highlighting a clear knowledge gap. Across the more common skin cancers, ChatGPT was most accurate for melanoma and least accurate for BCC. Lastly, for both melanoma and BCC, ChatGPT had lower accuracy for SOC images, indicating that racial bias may extend to AI tools and limit their validity. Sakshi Chopra<sup>1</sup>, Ysaac Zegeye<sup>1</sup>, Michelle Pavlis<sup>2, 3</sup> 1. Duke University School of Medicine, Durham, NC, United States. 2. Department of Dermatology, Duke University Health System, Durham, NC, United States. 3. Durham VA Health Care System, Durham, NC, United States. Minoritized Populations and Health Disparities Research