Data engineering improves recurrence prediction in merkel cell carcinoma
Need to claim your poster? Find the KiKo table at the conference and they'll help
you get set up.
Presented at: Society for Investigative Dermatology 2025
Date: 2025-05-07 00:00:00
Views: 2
Summary: Abstract Body: Background: Merkel cell carcinoma (MCC) is an aggressive skin cancer with high recurrence rates. Predicting recurrence has been challenging due to limited sample sizes and heterogenous data. Here, we demonstrate that data engineering techniques applied to a small dataset can significantly improve machine learning (ML) model performance in predicting MCC recurrence. Methods: We used a dataset of 105 deidentified MCC cases from University Hospitals Cleveland Medical Center, including variables like age, sex, immunosuppression, UV exposure, tumor visibility, radiation therapy, tumor stage, and outcomes. After removing confounding columns and rows with missing data, 80 samples (22 with recurrence) remained. Correlation analysis was used to reduce feature size, which was confirmed by Least Absolute Shrinkage (Lasso) regression model. Using stratified splitting, we allocated 48 patients for training and 32 for testing. Minority upsampling was applied to the training set. We trained an extreme gradient boost (XGBoost) model using cross-validation and grid search. Results: The upsampled XGBoost model achieved an accuracy of 0.72 and an area under receiver operating characteristic curve of 0.75 on the test set. Radiation therapy was the most significant predictor of recurrence, followed by tumor stage (AJCC 8th Edition). Sex and tumor visibility were not predictive. The model’s low false negative rate highlights its potential for identifying high-risk patients. Conclusions: Recurrence prediction in MCC using ML models trained on clinical data is feasible but limited by sample size and single-institution data. Future work should focus on larger sample size to improve prediction accuracy and generalizability. Roshan Lodha<sup>1, 4</sup>, Kelsey Ouyang<sup>1</sup>, Claire Reynolds<sup>2</sup>, Bryan Carroll<sup>3</sup> 1. Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, United States. 2. Case Western Reserve University, Cleveland, OH, United States. 3. University Hospitals, Cleveland, OH, United States. 4. National Institutes of Health, Bethesda, MD, United States. Bioinformatics, Computational Biology, and Imaging