P254 Deep Learning Model for Distinguishing MES 0 and MES 1 in Patients with Ulcerative Colitis

Kim, J.E.(1)*;Choi, Y.H.(2);Lee, Y.C.(3);Seong, G.(4);Song, J.H.(5);Kim, E.R.(5);Hong, S.N.(5);Chang, D.K.(5);Kim, Y.H.(5);Shin, S.Y.(2);Kim, T.J.(5);

(1)Samsung medical center, Division of Gastroenterology- Department of Medicine, Seoul, Korea- Republic Of;(2)Sungkyunkwan University, Department of Digital Health, Seoul, Korea- Republic Of;(3)Samsung medical center, Research Institute for Future Medicine, Seoul, Korea- Republic Of;(4)Nowon Eulji Medical center- Eulji University, Department of Medicine, Seoul, Korea- Republic Of;(5)Samsung medical center, Department of Medicine, Seoul, Korea- Republic Of;

Background

Endoscopic remission has recently been defined as a Mayo endoscopic sub-score (MES) of 0. Therefore, patients with MES 1 need to step up to achieve an MES of 0 on the endoscopic score. In discriminating between MES 0 and 1, the inter-observer variation was very severe among the endoscopists. This study aimed to narrow the gap in distinguishing between MES 0 and MES1 using a deep learning model.

Methods

From the endoscopic images of 492 ulcerative colitis (UC) patients with MES improvement (MES 0 or MES 1) from January 2018 to December 2019 at this center, two representative images of the colon and rectum were selected, and a total of 984 images were analyzed. Our model was composed of a convolutional neural network (CNN)-based encoder, two auxiliary classifiers for the colon and rectum, and a final MES classifier for the combined image features of the two inputs. The following three experiments were conducted to validate and test the proposed model: 12-fold cross-validation for representative model selection, performance comparison with a novice group through an internal test dataset, and an external test with Hyperkvasir, a public gastrointestinal endoscopic image dataset.

Results

In the internal test, our model showed an F1-score of 0.92. This was, on average, 0.11 higher than each of the seven novices and 0.02 higher than their consensus. Considering MES 1 as positive, the area under the receiver operating characteristic curve (AUROC) was 0.97, and the area under the precision-recall curve (AUPRC) was 0.98. In the external test with Hyperkvasir, our model showed an F1-score of 0.89, AUROC of 0.86, and AUPRC of 0.97.

Conclusion

We found that the proposed CNN-based model that integrates the image features of the colon and rectum has superior performance in discriminating between MES 0 and MES 1 in patients with UC. Further prospective studies are required to prove the clinical utility of our model in clinical practice.