Causal Populations Identification througth Hidden Distributions Estimation

Celine Beji

Thèse Année : 2021

Causal Populations Identification througth Hidden Distributions Estimation

Identification des populations causales par l'estimation de distributions cachées

(1)

Celine Beji

Fonction : Auteur

Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision

Résumé

In a counterfactual framework, this thesis formalizes the causal inference as a density estimation problem. The aim is to estimate the probability distribution of a mixture of four separate populations, defined by the outcomes with and without treatment. The fundamental problem is that the two outcomes are not simultaneously observable. Two models, leveraging causal constraints built from the partial information of the observed outcomes, are proposed. The first parametric approach is based on an Expectation-Maximization algorithm. The parameters of the causal populations distributions are iteratively estimated by maximizing the likelihood, while adding a prior on the posterior probabilities in an intermediate step. The second non-parametric approach uses an Auto-Encoder architecture enhanced by a prior. The prior is materialized as a mask which is introduced in the intermediate layer of the network. Experiments are conduced on synthetic and real-life datasets to prove the efficiency of these approaches and some extensions are proposed.

Dans un cadre contrefactuel, cette thèse formalise l’inférence causale comme un problème d’estimation de densité. L’objectif est d’estimer la distribution de probabilité d’un mélange de quatre populations distinctes, définies par les résultats avec et sans traitement. Le problème fondamental est que les deux résultats ne sont pas observables simultanément. Deux modèles, introduisant des contraintes de causalité à partir de l’information partielle des résultats observés, sont proposés. La première approche, paramétrique, est basée sur un algorithme d’Espérance-Maximisation. Les paramètres des distributions des populations causales sont estimés itérativement en maximisant la vraisemblance, tout en ajoutant un apriori sur les probabilités a posteriori dans une étape intermédiaire. La seconde approche non-paramétrique utilise une architecture d’Auto-Encodeur améliorée par un apriori. Ce dernier se présente sous la forme d’un masque dans la couche intermédiaire du réseau. Des expérimentations sont menées sur des ensembles de données synthétiques et réelles pour prouver l’efficacité de ces approches. Quelques extensions sont également proposées.

Mots clés

Causal inference Counterfactual outcome Individual Treatment Effect Uplift Causal constraint Latent space Auto-Encoder Expectation-Maximization

Inférence causale Résultat contrefactuel Effet de traitement individuel Uplift Contrainte de causalité Espace latent Auto-Encodeur Espérance-Maximisation

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

2021UPSLD004.pdf (2.06 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://theses.hal.science/tel-03545705

Soumis le : jeudi 27 janvier 2022-13:59:07

Dernière modification le : mercredi 29 mars 2023-03:49:48

Dates et versions

tel-03545705 , version 1 (27-01-2022)

Identifiants

HAL Id : tel-03545705 , version 1

Citer

Celine Beji. Causal Populations Identification througth Hidden Distributions Estimation. Artificial Intelligence [cs.AI]. Université Paris sciences et lettres, 2021. English. ⟨NNT : 2021UPSLD004⟩. ⟨tel-03545705⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-DAUPHINE STAR LAMSADE-DAUPHINE PSL

721 Consultations

299 Téléchargements

Causal Populations Identification througth Hidden Distributions Estimation

Identification des populations causales par l'estimation de distributions cachées

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager