Thèse soutenue

Intégration des données épidémiologiques et environnementales pour améliorer les systèmes de surveillance basés sur les évènements

FR  |  
EN
Auteur / Autrice : El Bahdja Boudoua
Direction : Maguelonne TeisseireAnnelise Tran
Type : Thèse de doctorat
Discipline(s) : Informatique
Date : Soutenance le 14/10/2024
Etablissement(s) : Université de Montpellier (2022-....)
Ecole(s) doctorale(s) : École doctorale Information, Structures, Systèmes (Montpellier ; 2015-....)
Partenaire(s) de recherche : Laboratoire : Territoires, Environnement, Télédétection et Information Spatiale (Montpellier)
Jury : Président / Présidente : Isabelle Mougenot
Examinateurs / Examinatrices : Benoît Durand
Rapporteurs / Rapporteuses : Emmanuelle Fromont, Sabine Loudcher Rabaseda

Résumé

FR  |  
EN

Event Based Surveillance (EBS) systems such as HealthMap, Promed and PADI-web are used daily to timely detect outbreak events reported in web articles. Once the articles are collected, these systems rely on human moderation and supervised classification algorithms to classify articles according to their relevance. Applying such methods can be challenging, as epidemiological datasets have an imbalanced class distribution, and because the annotation task, which is critical to the success of these models, can be expensive and time consuming. Another important limitation of EBS systems is that the drivers of disease transmission (e.g. disease characteristics, environmental and epidemiological risk factors) are not always found in textual data and are therefore not taken into account by EBS systems.In this context, we propose an unsupervised approach that relies on the spatio-temporal information of the reported epidemiological events, to classify articles while taking into account the environmental factors associated with disease onset through risk mapping. This method, called EpiDCA, is an adaptation of the Dendritic Cells Algorithm (DCA), inspired by the danger theory. EpiDCA is characterized by expert-defined parameters, making it applicable to different diseases and environmental contexts. The proposed method was first tested and evaluated using a dataset related to avian influenza (AI) in Asia between 2018 and 2019, and a suitability map for AI produced for the same area. To measure the accuracy of the model, we calculated the precision, recall and F-score. EpiDCA achieved a very good performance with an F-score of 0.82 and 0.90 for an imbalanced and a balanced dataset respectively. The results confirmed that considering disease risk factors is a good approach in event classification. EpiDCA was then compared with state-of-the-art supervised machine learning methods and appeared to be competitive.After this initial application, we aimed to evaluate the robustness and genericity of the method in different geographical contexts and across various epidemiological systems, specifically; a transboundary animal disease (African Swine Fever) and a vector-borne zoonotic disease (West-Nile Virus Disease) in Europe. For this purpose, we constructed an original dataset from articles detected by PADI-web. We also developed a method and guidelines to annotate the articles. The consistent results confirmed the robustness of EpiDCA. Then we extended the method by integrating additional covariates to further enhance its reactivity and accuracy. Future perspectives with EpiDCA include the reduction of the number of parameters and the application of the model to other surveillance contexts that rely on the same sources, such as plant disease surveillance, and foodsecurity surveillance.