P070 Machine learning approaches to identify IBD biomarkers from longitudinal microbiome data

M. Madgwick1,2, P. Sudhakar1,2,3, N.S. Tabib3, P. Norvaisas4, P. Creed4, B. Verstockt3,5, S. Vermeire3,5, T. Korcsmáros1,2

1Earlham Institute, Organisms and Ecosystems, Norwich, UK, 2Quadram Institute, Gut Microbes and Health, Norwich, UK, 3KU Leuven, Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders TARGID, Leuven, Belgium, 4BenevolentAI, Precision Medicine Team, London, UK, 5University Hospitals Leuven, Department of Gastroenterology and Hepatology, Leuven, Belgium

Background

Inflammatory bowel disease (IBD) has been shown to be associated with alterations in the intestinal microbiome. However, the precise nature of these microbial changes remains unclear. With billions of microbes present within the gut, novel and powerful computational techniques are required to identify the relevant shifts in microbiota contributing to the disease. Machine learning (ML) allows a data-driven approach to identify these discrete dynamic changes, while the findings of the ML algorithms can be interpreted using systems biology (SB) techniques. By combining ML and SB approaches, we aim to characterise key microbial factors in IBD pathogenesis, distinct patterns of variability in a diverse patient cohort and provide a method for patient stratification.

Methods

The causal relationship between the changes in the gut microbiome and IBD is difficult to establish. Data from cross-sectional studies are plagued by confounding factors and inconsistencies between cohorts. To overcome this, the authors used rich longitudinal datasets and integrated metagenomic, multi-omic and clinical patient data. This workflow has been validated using large longitudinal IBD databases, including data from IBDMDB. We assessed the performance of the ML models using well-documented performance metrics to ensure the outcomes were robust.

Results

As a baseline, we used multiple ML models to predict disease type (UC, CD and non-IBD) from integrated multi-omics profiles. We analysed multiple ML techniques, including linear (e.g. linear mixed model), non-linear (e.g. Random Forest), time-series models (e.g. Rotation Forest) and deep learning models (e.g. long short-term memory network model). The authors identified the models which would allow flexibility to analyse the dynamic nature of the microbiome and allow integration of the microbiome data with clinical patient data. The payoff of greater flexibility was a reduction in the model performance in terms of identifying specific features from the metagenomics that could be used as biomarkers. However, we were able to identify connections between microbial and host proteins relevant to IBD and were able to stratify these by the patient’s metagenomic data.

Conclusion

We have developed an integrated ml-based microbiome analysis pipeline to identify biomarkers for IBD from longitudinal metagenomic data. Furthermore, using a variety of SB approaches, we were able to interpret the predicted key microbial features and communities by inferring connections between microbial and host proteins. This pipeline will enable us to analyse vast amounts of patient microbiome data in the context of clinical and metagenomic data, to allow identification of biomarkers for disease subtypes.