OP25 Prevalence of IBD in the Netherlands: development and validation of machine learning models for administrative data
Van Linschoten, R.(1,2,3)*;van Leeuwen, N.(2);Hazelzet, J.A.(2);van der Woude, C.J.(3);van Noord, D.(4);West, R.L.(4);
(1)Franciscus Gasthuis & Vlietland, Gastroenterology & Hepatology, Rotterdam, The Netherlands;(2)Erasmus MC, Department of Public Health, Rotterdam, The Netherlands;(3)Erasmus MC, Department of Gastroenterology & Hepatology, Rotterdam, The Netherlands;(4)Franciscus Gasthuis & Vlietland, Department of Gastroenterology & Hepatology, Rotterdam, The Netherlands;
Background
Treatment of IBD has improved with the introduction of biologics and small molecules, yet this has come with a considerable increase in healthcare costs. Due to the increasing cost burden, reliable nationwide epidemiological data on the prevalence of IBD is necessary to inform health policy makers, especially as the prevalence of IBD is forecasted to double between 2010 and 2030. We aimed to develop a model for identifying prevalent IBD cases in administrative data and to determine prevalence of IBD in the Netherlands.
Methods
Data on hospital care came from the Dutch National Hospital Care Basic Registration (Landelijke Basisregistratie Ziekenhuiszorg). This database contains data on all hospital admissions (since 1991), outpatient clinic visits (since 2017), and dispensations of biologics and small molecules (since 2015) of all hospitals in the Netherlands. Data on pathology reports were retrieved from the nationwide network and registry of histo- and cytopathology in the Netherlands (PALGA), this database contains coded pathology reports of all Dutch hospitals since 1991. These datasets were combined with a reference cohort with a verified IBD diagnosis (yes/no) and demographics for all patients. Models were trained to optimise the F-score and evaluated using five-times repeated ten-fold cross-validation. The best performing model from cross-validation was applied to assess IBD prevalence in the Netherlands.
Results
The reference cohort consisted of 10,155 patients, of which 3,381 were diagnosed with IBD. All models performed well in the cross-validation procedure, with F-scores of 0.870 and higher. The use of more flexible models led to improved performance, with gradient boosted trees performing best in the cross-validation procedure (Table 1). When applying the gradient boosted trees model to the general population, a prevalence of 691 per 100,000 was found for IBD in the Netherlands on 31-12-2020. Cases are unevenly distributed throughout the Netherlands (Figure 1), with the highest incidence in the south (Middle Limburg: 936 per 100,000 inhabitants) and the lowest in the northwest (Amsterdam: 545).
Conclusion
Prevalent IBD cases can be identified from administrative data using a gradient boosted trees model. Using this model, we have shown that prevalence of IBD in the Netherlands is increasing. However, while prior studies have predicted a growth in prevalence of 50% over the last 10 years, we found that IBD prevalence has only increased by 12% in that time range. The lower increase in prevalence may be predictive of a transition to the fourth epidemiological stage of IBD, aptly named Prevalence Equilibrium.