Thèse soutenue

Modélisation gaussienne de rang plein des mélanges audio convolutifs appliquée à la séparation de sources

FR  |  
EN
Auteur / Autrice : Quang-Khan-Ngoc Duong
Direction : Rémi Gribonval
Type : Thèse de doctorat
Discipline(s) : Traitement du signal et télécommunications
Date : Soutenance en 2011
Etablissement(s) : Rennes 1
Ecole(s) doctorale(s) : École doctorale Mathématiques, télécommunications, informatique, signal, systèmes, électronique (Rennes)
Partenaire(s) de recherche : autre partenaire : Université européenne de Bretagne (2007-2016)

Mots clés

FR

Résumé

FR  |  
EN

We consider the task of under-determined and determined reverberant audio source separation, that is the extraction of the signal of each source from a multichannel audio mixture. We propose a general Gaussian modeling framework whereby the contribution of each source to all mixture channels in the time-frequency domain is modeled as a zero-mean Gaussian random variable whose covariance encodes both the spatial and the spectral characteristics of the source. In order to better account for the reverberant mixing process, we relax the conventional narrowband assumption resulting in rank-1 spatial covariance and compute the upper bound on the separation performance achievable with full-rank spatial covariance. Experimental results indicate an improvement of up to 6 dB Signal-to-Distortion Ratio (SDR) in moderate to high reverberant conditions which supports this generalization. We also consider the use of quadratic time-frequency representations and that of the auditory-motivated equivalent rectangular bandwidth (ERB) frequency scale to increase the amount of exploitable information and decrease the overlap between the sources in the input representation. After this theoretical validation of the proposed framework, we focus on estimating the model parameters from a given mixture signal in a practical blind source separation scenario. We derive a family of Expectation-Maximization (EM) algorithms to estimate the parameters either in the maximum likelihood (ML) sense or in the maximum a posteriori (MAP) sense. We propose a family of spatial location priors inspired by the theory of room acoustics as well as a spatial continuity prior and investigate the use of two spectral priors previously used in a single-channel or rank-1 multichannel context, namely spectral continuity and Nonnegative Matrix Factorization (NMF). The source separation results given by the proposed approach are compared with several baseline and state-of-the-art algorithms on both simulated mixtures and real-world recordings in various scenarios.