P264 Machine learning can accurately predict development of Inflammatory Bowel Disease
Dyer, E.(1);Kliper, E.(2);Zamstein, N.(2);Hodik, G.(3,4,5);Kariv, R.(3,4,5);Cohen, N.A.(3,5)*;
(1)The University of Chicago, Section of Hematology and Oncology- Department of Medicine, Chicago, United States;(2)MD Clone, MD Clone, Beer Sheva, Israel;(3)Tel Aviv Medical Center, Department of Gastroenterology and Liver Diseases, Tel Aviv, Israel;(4)Maccabi Health Services, Research Institute, Tel Aviv, Israel;(5)Tel Aviv University, Sackler Faculty of Medicine, Tel Aviv, Israel;
Background
Prediction of inflammatory bowel disease (IBD) development is the subject of intense study. We have previously shown that significant changes occur in routine laboratory tests in the 5 years preceding IBD diagnosis. Using routinely collected laboratory results, we developed machine learning models to predict development of IBD.
Methods
We extracted data from the electronic medical records of Maccabi Health Services and included patients with IBD ≥ 16 years of age with a minimum of 5 years follow up and an age and sex matched healthy cohort. The outcome measured was entry into the IBD registry. The study population was split into training and validation cohorts and using laboratory values in the 5 years prior to IBD diagnosis as features, we assessed the performance of supervised learning models and performed interpretability analysis with unsupervised dimensionality reduction.
Results
5643 patients with IBD and 17199 healthy people were included in this study. 3039 (53.8%) had Crohn’s disease (CD), 2322 (41.1%) ulcerative colitis (UC) and 282 (5%) indeterminate colitis. The mean age of the IBD and healthy cohorts at study inclusion was 39.3 ± 16.5 and 38.5 ± 15.6 years, respectively. Weighted nearest neighbour imputation was used to correct for data missingness. Unsupervised Uniform Manifold Approximation Projection dimensionality reduction correctly separated the cohort into healthy and IBD clusters (Figure 1A). Furthermore, this algorithm identified two distinct groups correlating to CD or UC diagnosis (Figure 1B). Supervised learning models, namely neural net (AUC 0.95), linear support vector machine (SVM) (AUC 0.93), radial basis function SVM (AUC 0.85) and logistic regression (AUC 0.93) models, accurately predicted IBD diagnosis (Figure 2). These models were repeated without C-reactive protein and the predictive value was not significantly changed. Moreover, we excluded data in the 2 years preceding IBD diagnosis to determine whether model performance was reduced and the predictive value remained the same. Interestingly, using only data from 3-5 years preceding IBD diagnosis, the model could better distinguish between CD and UC (Figure 3).
Conclusion
In this study, machine learning models could accurately predict a diagnosis of IBD using laboratory data from the 5 years preceding IBD diagnosis. Prospective studies are needed to test these models in a real world setting.