High dimensional pattern learning applied to symbolic time-series

Amir Dib

Le moteur de recherche
des thèses françaises

Désactiver l'aide à la saisie

Apprentissage de motifs en grande dimension appliqué aux séries temporelles

FR |

EN

Auteur / Autrice :	Amir Dib
Direction :	Nicolas Vayatis, Mathilde Mougeot
Type :	Thèse de doctorat
Discipline(s) :	Mathématiques appliquées
Date :	Soutenance le 11/10/2021
Etablissement(s) :	université Paris-Saclay
Ecole(s) doctorale(s) :	École doctorale de mathématiques Hadamard (Orsay, Essonne ; 2015-....)
Partenaire(s) de recherche :	Equipe de recherche : Apprentissage statistique et données massives
	Référent : École normale supérieure Paris-Saclay (Gif-sur-Yvette, Essonne ; 1912-....)
	graduate school : Université Paris-Saclay. Graduate School Mathématiques (2020-….)
	Laboratoire : Centre Borelli (Gif-sur-Yvette, Essonne ; 2020-...)
Jury :	Président / Présidente : Laurent Oudre
	Examinateurs / Examinatrices : Stéphane Gaïffas, Jean-Michel Loubès, Eli Upfal
	Rapporteurs / Rapporteuses : Stéphane Gaïffas, Jean-Michel Loubès

Mots clés

FR |

EN

Mots clés contrôlés

Détection des anomalies (informatique)

Statistique bayésienne

Séries chronologiques

Apprentissage supervisé (intelligence artificielle)

Immeubles (théorie des groupes)

Espaces symétriques

Mots clés libres

Statistiques

Apprentissage automatique

Apprentissage bayesien

Detection anomalie

Maintenance predictive

Ferroviaire

Résumé

FR |

EN

While the adoption of machine learning in many applied contexts has been growing rapidly in the last decade, there remain challenges to use it in certain industrial settings. The main reason is the clash between established historical procedures with the uncertainty and lack of transparency of a machine learning pipeline's decision process. Another reason is that the input needed to feed a traditional machine learning model does not fit the available type or quality of available data. Most industrial databases have not been developed for statistical analysis but to comply with the regulatory requirements and to perform administrative tasks. In particular, non-numerical or symbolic features are common as it is a versatile way of recording events of interest. Examples of such data are textual documents, sequence of log-events or DNA sequences. The exponential number of possible patterns typically dominates the complexity associated with learning relevant information from symbols. This thesis's applicative framework and primary motivation is to design efficient, human-readable and computationally tractable methods for predictive maintenance on the french train fleet. To that end, we propose to go beyond standard approaches by using a combination of traditional machine learning algorithms with pattern mining techniques to allow human experts to understand and interact with the algorithmic layer of the predictive maintenance pipeline. This thesis's main objective is to tackle these issues by proposing approaches that can be generally applied to a symbolic sequence of data with a human-readable output and trained at a reasonable computational cost. To that end, we begin by constructing a complete machine learning pipeline solution for predictive maintenance on a large fleet of rail vehicles that can be computed at a reasonable cost and provides valuable insight on the underlying symbol dynamic of the degradation process. As a second contribution, we propose a new method for symbolic data set based on a Bayesian generative model for patterns that can increases score accuracy in an interpretable fashion for any symbolic data set. As a third contribution, we introduce a new progressive mining method based on local complexities to obtain sharper statistical bounds on the pattern frequency. Finally, a new and general stochastic optimization method based on alternative sampling is proposed. This method can be applied to the specific use case of Bayesian learning through the Variational Inference setting. In this instance, we provide theoretical and empirical proof of the superiority of this approach compared to the most advanced methods.

Le moteur de recherche
des thèses françaises

Les thèses

Les personnes
liées aux thèses

Apprentissage de motifs en grande dimension appliqué aux séries temporelles

Mots clés

Mots clés contrôlés

Mots clés libres

Résumé

Le moteur de recherche des thèses françaises

Les thèses

Les personnes liées aux thèses

Recherche Avancée

Apprentissage de motifs en grande dimension appliqué aux séries temporelles

Mots clés

Mots clés contrôlés

Mots clés libres

Résumé

Le moteur de recherche
des thèses françaises

Les personnes
liées aux thèses