Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy
Sharib Ali ·
Mariia Dmitrieva ·
Noha Ghatwary ·
Sophia Bano ·
Gorkem Polat ·
Alptekin Temizel ·
Adrian Krenzer ·
Amar Hekalo ·
Yun Bo Guo ·
Bogdan Matuszewski ·
Mourad Gridach ·
Irina Voiculescu ·
Vishnusai Yoganand ·
Arnav Chavan ·
Aryan Raj ·
Nhan T. Nguyen ·
Dat Q. Tran ·
Lê Duy Huỳnh ·
Nicolas Boutry ·
Shahadate Rezvy ·
Haijian Chen ·
Yoon Ho Choi ·
Anand Subramanian ·
Velmurugan Balasubramanian ·
Xiaohong W. Gao ·
Hongyu Hu ·
Yusheng Liao ·
Danail Stoyanov ·
Christian Daul ·
Stefano Realdon ·
Renato Cannizzaro ·
Dominique Lamarque ·
Terry Tran-Nguyen ·
Adam Bailey ·
Barbara Braden ·
James East ·
Jens Rittscher
The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, mainly: 1) presence of multi-class artefacts that hinder their visual interpretation, and 2) difficulty in identifying subtle precancerous precursors and cancer abnormalities. Artefacts often affect the robustness of deep learning methods applied to the gastrointestinal tract organs as they can be confused with tissue of interest. EndoCV2020 challenges are designed to address research questions in these remits. In this paper, we present a summary of methods developed by the top 17 teams and provide an objective comparison of state-of-the-art methods and methods designed by the participants for two sub-challenges: i) artefact detection and segmentation (EAD2020), and ii) disease detection and segmentation (EDD2020). Multi-center, multi-organ, multi-class, and multi-modal clinical endoscopy datasets were compiled for both EAD2020 and EDD2020 sub-challenges. The out-of-sample generalization ability of detection algorithms was also evaluated. Whilst most teams focused on accuracy improvements, only a few methods hold credibility for clinical usability. The best performing teams provided solutions to tackle class imbalance, and variabilities in size, origin, modality and occurrences by exploring data augmentation, data fusion, and optimal class thresholding techniques.