Y-ECCO Literature Review: Toer Stevens

Toer Stevens

Development and validation of a deep neural network for accurate evaluation of endoscopic images from patients with ulcerative colitis

Takenaka K, Ohtsuka K, Fujii T, et al.

Gastroenterology. 2020;158: 2150–7.

Toer Stevens
© Toer Stevens


Nowadays, IBD treatment not only targets symptomatic disease control but also aims to heal the intestinal mucosa [1] In Ulcerative Colitis (UC) there is mounting evidence that histological healing of the intestinal mucosa is associated with incremental benefit compared to endoscopic healing alone [2–8]. In a very recent meta-analysis of ten studies including 757 UC patients with complete endoscopic remission (Mayo Score 0 or equivalent) and with a minimum follow-up of >12 months,  patients with histological remission had a 63% lower risk of clinical relapse (RR 0.37, 95% CI 0.24–0.56) than patients with ongoing microscopic inflammation [9]. 

Nevertheless, the adoption of this target remains controversial. Further evaluation is warranted to investigate the ability and cost-effectiveness of achieving this target with the limited number of available treatment options. Furthermore, biopsy procurement and analysis is invasive, costly and time intensive. Finally, a high variability in reported histological disease activity scores is observed when comparing general pathologists with expert gastrointestinal pathologists [10]. These drawbacks limit widespread implementation, in both daily practice and clinical trials. Takenaka et al. address some of these hurdles by employing a deep neural network to enable computer-aided diagnosis of endoscopic and histological remission in patients with UC.

Key Findings

In their paper “Development and validation of a deep neural network for accurate evaluation of endoscopic images from patients with Ulcerative Colitis”, Takenaka et al. used the Inception-v3, a pre-trained neural network model  (trained on ImageNet [11]). They subsequently re-trained this deep learning algorithm using a retrospective dataset containing 40,758 labelled endoscopy images and 6885 labelled biopsy images of 2012 UC patients (transfer learning). Histological data were linked to three consecutive endoscopy images: before, at the time of and just after the biopsy was taken (a total of 20,655 images). Subsequently, the performance and generalisability of the model was evaluated in a validation dataset, prospectively including 875 UC patients. Endoscopic remission was defined by an Ulcerative Colitis Endoscopic Index of Severity (UCEIS) score of 0. The UCEIS score was assigned to the most severely inflamed image of each bowel segment and evaluated by three experienced IBD endoscopists. Histological remission was defined by the absence of neutrophils in the epithelium (Geboes <3.1). Three general pathologists assigned the scores and in the event of discrepancies a final score was assigned by a board-certified IBD pathologist. All readers were blinded to clinical information. The machine learning algorithm showed a diagnostic accuracy for endoscopic remission of 90.1% (95%CI 89.2–90.9) and histological remission of 92.9% (95%CI 92.1–93.7).

Through this work, the authors have enabled computer-aided diagnosis of endoscopic remission and prediction of histological remission. This study is in alignment with, and an important addition to, the existing studies on computer-aided diagnosis of endoscopic and histological outcomes in Ulcerative Colitis [12–14]. Several important implications can be envisioned. First, the model would decrease the need for invasive and costly biopsy procurement and subsequent processing, digitalisation and analysis. Nowadays, in clinical trials, a central reading process is in place to reduce bias and overcome the problem of high interobserver variability for the different endoscopic and histological disease activity indices. However, central reading is not without limitations, including considerable time commitment, costs, reader variability, bias, motivation and fatigue [15]. A computer, on the other hand, is fast, never grows tired and scores every image consistently. The current results should therefore rightfully elicit some excitement, as an accurate computer-aided diagnostic tool for endoscopic and histological remission would be a solution to former problems. Considering the benefits of histological healing, accurate computer-aided prediction of this target – without the need for biopsies – would be a breakthrough. Finally, considering the Young ECCO Community, this type of algorithm could potentially aid in the training of future gastroenterologists in scoring endoscopic disease severity, as has been suggested previously [16].

Despite the huge potential of the field, it is probably fair to say that we are still far away from implementation in daily practice. There are a few points to consider. First, deep learning strategies rely heavily on the dataset used for training. Here, a large annotated (UCEIS and Geboes score labelled images) dataset was used for training (supervised learning). As discussed previously, there is high inter- and intraobserver variability which will at least in part be transposed to the neural network. Furthermore, both the training and the validation datasets are derived from a single expert centre. Consequently, results are not evidently generalisable to images of UC patients obtained in other centres by different gastroenterologists or with different endoscopy devices. Second, in the validation phase a manual pre-selection was made of the most optimal (most severely inflamed) images to test the algorithm, which may have benefited the models’ performance and caused selection bias. The authors are planning another study to adapt the algorithm to videocolonoscopies. In theory, videos contain many images (frames) with a larger variation in quality, increasing the robustness of the algorithm [17, 18]. In practice, however, video input may considerably decrease the accuracy due to the noise of suboptimal data (e.g. inadequate insufflation, faecal material and blurring) [19]. Third, in the design of machine learning algorithms, normally the training set is used to train the model and optimise its hyperparameters. Thereafter, when testing the performance in an independent test (validation) set, it is important not to further adjust the model as this may lead to overfitting, an effect known as ‘data leakage’ [19]. Surprisingly, however, the authors state that the final version of the neural network was constructed in April 2019, at the end of the prospective validation phase. It is unclear whether additional modifications to the model were made at that time.


In conclusion, Takenaka et al. have developed a computer model that can accurately diagnose endoscopic remission and predict – without the need for biopsies – histological remission based on still endoscopy images. I look forward to seeing the performance of the model in different populations and centres, and seeing the potential of this technique for evaluating these outcomes in real-time using videocolonoscopies.


  1. Peyrin-Biroulet L, Sandborn W, Sands BE, et al. Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE): determining therapeutic goals for treat-to-target. Am J Gastroenterol. 2015;110:1324–38.
  2. Ponte A, Pinho R, Fernandes S, et al. Impact of histological and endoscopic remissions on clinical recurrence and recurrence-free time in ulcerative colitis. Inflamm Bowel Dis. 2017;23:2238–344.
  3. Frieri G, Galletti B, Di Ruscio M, et al. The prognostic value of histology in ulcerative colitis in clinical remission with mesalazine. Therap Adv Gastroenterol. 2017;10:749–59.
  4. Zenlea T, Yee EU, Rosenberg L, et al. Histology grade is independently associated with relapse risk in patients with ulcerative colitis in clinical remission: a prospective study. Am J Gastroenterol. 2016;111:685–90.
  5. Park S, Abdi T, Gentry M, Laine L. Histological disease activity as a predictor of clinical relapse among patients with ulcerative colitis: systematic review and meta-analysis. Am J Gastroenterol. 2016;111:1692–701.
  6. Bryant RV, Winer S, Travis SPL, Riddell RH. Systematic review: histological remission in inflammatory bowel disease. Is 'complete' remission the new treatment paradigm? An IOIBD initiative. J Crohns Colitis. 2014;8:1582–97.
  7. Gordon IO, Agrawal N, Willis E, et al. Fibrosis in ulcerative colitis is directly linked to severity and chronicity of mucosal inflammation. Aliment Pharmacol Ther. 2018;47:922–39.
  8. Lobaton T, Bessissow T, Ruiz-Cerulla A, et al. Prognostic value of histological activity in patients with ulcerative colitis in deep remission: a prospective multicenter study. United European Gastroenterol J. 2018;6:765–72.
  9. Yoon H, Jangi S, Dulai PS, et al. Incremental benefit of achieving endoscopic and histologic remission in patients with ulcerative colitis: a systematic review and meta-analysis. Gastroenterology. 2020;159:1262–75.e7.
  10. Romkens TEH, Kranenburg P, Van Tilburg A, et al. Assessment of Histological remission in ulcerative colitis: discrepancies between daily practice and expert opinion. J Crohns Colitis 2018;12:425–31.
  11. Deng J, Dong W, Socher R. ImageNet: a large-scale hierarchical image database. IEEE Conf Comput Vis Pattern Recognit. 2009: 2–9.
  12. Ozawa T, Ishihara S, Fujishiro M, et al. Novel computer-assisted diagnosis system for endoscopic disease activity in patients with ulcerative colitis. Gastrointest Endosc. 2019;89:416–21 e1.
  13. Stidham RW, Liu W, Bishu S, et al. Performance of a deep learning model vs human reviewers in grading endoscopic disease severity of patients with ulcerative colitis. JAMA Netw Open. 2019;2:e193963.
  14. Bossuyt P, Nakase H, Vermeire S, et al. Automatic, computer-aided determination of endoscopic and histological inflammation in patients with mild to moderate ulcerative colitis based on red density. Gut. 2020;69:1778–86.
  15. Gottlieb K, Daperno M, Usiskin K, et al. Endoscopy and central reading in inflammatory bowel disease clinical trials: achievements, challenges and future developments. Gut 2020 Jul 22; doi: 10.1136/gutjnl-2020-320690. [Epub ahead of print].
  16. Holmer AK, Dulai PS. Using artificial intelligence to identify patients with ulcerative colitis in endoscopic and histologic remission. Gastroenterology. 2020;158:2045–7.
  17. Urban G, Tripathi P, Alkayali T, et al. Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology. 2018;155:1069–78 e8.
  18. Byrne MF, Chapados N, Soudan F, et al. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut. 2019;68:94–100.
  19. van der Sommen F, de Groof J, Struyvenberg M, et al. Machine learning in GI endoscopy: practical guidance in how to interpret a novel field. Gut. 2020;69:2035–45.


Toer Stevens – Short biography

Toer W. Stevens is currently finalising his PhD thesis “Treatment optimization in Inflammatory Bowel Disease” and is on the verge of starting a residency in gastroenterology and hepatology at the Amsterdam UMC, University of Amsterdam. His thesis focusses on the potential role of surgery as an early intervention in Crohn’s Disease and on biomarkers to predict treatment response and monitor disease activity.

Posted in ECCO News, Y-ECCO Literature Reviews, Committee News, Y-ECCO, Volume 15, Issue 4