19December2024

Y-ECCO Literature Review: Megha Bhandari

Megha Bhandari

Artificial intelligence enabled histological prediction of remission or activity and clinical outcomes in ulcerative colitis 

Iacucci M, Parigi TL, Del Amor R, et al.
Gastroenterology 2024;164:1180–8.


Megha Bandari
© Megha Bandari

Introduction

Ulcerative Colitis (UC) is characterised by episodes of recurrent inflammation affecting the colonic mucosa. Accurate assessment of disease activity and prediction of clinical outcomes are crucial for effective management. Traditionally, histological examination has been the gold standard for evaluating mucosal inflammation, but it is time-consuming and subject to inter-observer variability. Recent advances in artificial intelligence (AI) may offer a potential solution. Iacucci and colleagues explored the application of machine learning in diagnosing histological remission and predicting clinical outcomes in UC patients.

Methods

Iacucci and colleagues developed a convolutional neural network (CNN)-based AI model trained on a dataset of histological images labelled remission or active disease. Importantly, the model was externally validated in 154 biopsies (from 58 patients) with similar clinical characteristics although patients did have more evidence of more histological activity. The model was referred to as a computer-aided diagnosis (CAD) system. Diagnostic performance was reported as sensitivity, specificity, prognostic prediction through Kaplan-Meier and hazard ratios of flares between active and remission groups. The histological indices used by the expert pathologists were the Paddington International virtual ChromoendoScopy ScOre (PICaSSO) Histologic Remission Index (PHRI), Robarts Histologic Index (RHI) and Nancy Histologic Index (NHI). 

Key findings

When compared with the results of expert pathologists, the CAD model showed a specificity of 85% (95% CI, 0.80–0.89), a positive predictive value (PPV) of 75% (95% CI, 0.69–0.80), a negative predictive value (NPV) of 94% (95% CI, 0.90–0.96), and an accuracy of 87% (95% CI, 0.83–0.90), and an AUROC of 87% (95% CI, 0.83–0.90).

The study also investigated the integration of AI-based histological predictions with other clinical parameters, such as endoscopic findings and biomarkers, to predict clinical outcomes at the end of one year. The model was first used as described above to stratify the patients into disease remission versus active disease groups and then used to predict endoscopic activity and further risk of flare up to one year. This secondary analysis demonstrated that the CAD system could predict the presence of endoscopic inflammation in the same area where the biopsies were taken with approximately 80% accuracy.

When stratified by existing histological indices, the hazard ratio between the two groups for suffering any prespecified adverse clinical event, a proxy for flare-up, was 3.56 (95% CI, 2.10–6.05) when classified according to PHRI, 4.28 (95% CI, 2.33–7.84) according to RHI, and 3.55 (95% CI, 2.03–6.23) according to NHI. When the same analysis was performed by the model, the hazard ratio was 4.64 (95% CI, 2.76–7.8), similar to the corresponding analysis by human experts with any of the scores considered.

Discussion

This study from Iacucci and colleagues presents highly promising initial results for AI-enabled histological prediction in UC. Multicentre international collaboration enabled a large sample size and introduced heterogeneity into the input data. For example, the use of different biopsy forceps, biopsy stains, protocol and tissue orientation will vary between centres and this, along with the decision not to exclude biopsies with artefacts or lower quality, as long as they had been considered sufficient by the pathologist, reduces the risk of both overfitting and spurious correlation. Overfitting refers to a situation where the model learns not only the underlying patterns in the training data but also the noise and outliers. As a result, in the case of overfitting, a model performs exceptionally well on the training dataset but poorly on new, unseen data. This happens because the model becomes too complex and captures irrelevant details specific to the training data rather than generalising to other datasets. Spurious correlation in this context is when a machine learning model relies on irrelevant or unintended features – for example, the mark made by a biopsy forcep in the corner of an image – rather than the actual distinguishing features of the categories it is supposed to classify. In this study, the external validation led to a similar performance of the model, suggesting that it performs consistently across different datasets. However, several points warrant further discussion. The CAD model described did not take into account treatment occurring between histology samples and clinical endpoint at one year in its prediction; this may be particularly relevant and of clinical significance in relation to remission. It is also important to note that a key tenet of medical decision making is the ability to consider more than one diagnosis at a time. Given that dysplasia remains an important factor to detect in patients with UC, it is clear that this CAD system will need to incorporate dysplasia detection in the future, in order to have the most clinical utility for practising clinicians.

Conclusion

Overall, the CAD system described is the first UC-focused example of an AI model integrating histology with endoscopic imaging for disease monitoring and outcome prediction. The promising findings in this study suggest a future role for such a model in standardising and enhancing histological assessment both for clinical trials and for clinical practice. 

Profile

Megha Bhandari is currently undertaking a clinical research fellowship at the University of Cambridge and Cambridge University Hospitals in the United Kingdom. She has developed an interest in artificial intelligence and its relevance to medicine and has supplemented her clinical experience with programming proficiency through an internship at the European Bioinformatics Institute and a Clinical Informatics Fellowship at the University of Southampton Hospital. She is keen to pursue the application of machine learning to IBD and is undertaking a Master’s degree at the University of Cambridge to formalise this training.

Posted in ECCO News, Y-ECCO Literature Reviews, Volume 19, Issue 4, Committee News, Y-ECCO