Auteur / Autrice : | Mathieu Ramona |
Direction : | Gaël Richard, Bertrand David |
Type : | Thèse de doctorat |
Discipline(s) : | Signal et images |
Date : | Soutenance en 2010 |
Etablissement(s) : | Paris, Télécom ParisTech |
Mots clés
Résumé
We present here a system for speech/music audio classification, that relies on the excellent statistical properties of Support Vector Machines. This problems raises three questions : how can the SVM, by essence discriminative, be used effeciently on a problem involving more than two classes, how can an audio signal be characterized in a relevant way, and how can the temporel issue be adressed ? We propose a hybrid system for multi-class classification, based on a combination of One-vs-One and dendogram-based approaches, and allowing the estimation of posterior probabilities. The latter are used for the application of post-processing methods that take into account the neighboring frames' inter-dependancies. We thus propose a classification scheme based on the application of Hidden Markov Models on the posterior probabilities, along with an approach based on change detection between segments with "homogeneous" acoustic content. Concerning the audio signal characterization, since it involves a great amount of audio descriptors, we propose new algorithms for feature selection, based on the recent Kernel Alignement criterion. This criterion is also used for the kernel selection step in the classification process. The proposed algorithms are compared to the state-of-the-art, and constitute a relevant alternative in terms of computational cost and storage. The system built from these contributions has been used for a participation to the ESTER 2 evaluation campaign, that we present, along with our results.