Thèse soutenue

FR
Auteur / Autrice : Daniel Weinland
Direction : Radu HoraudEdmond BoyerRémi Ronfard
Type : Thèse de doctorat
Discipline(s) : Imagerie, vision, robotique
Date : Soutenance en 2008
Etablissement(s) : Grenoble INPG

Mots clés

FR

Mots clés contrôlés

Résumé

FR  |  
EN

Recognizing human actions is an important and challenging topic in computer vision, withmany important applications including video surveillance, video indexing and understanding of social interaction. From a computational perspective, actions can be defined as four-dimensional patterns, in space and in time. Such patterns can be modeled using several representations which differ from each other with respect to, among others, the visual information used, e. G. Shape or appearance, the representation of dynamics, e. G. Implicit or explicit, and the amount of invariance that the representation exhibits, e. G. A viewpoint invariance allowing to learn and recognize using different camera configurations. Our goal in this thesis is to develop a set of new techniques for action recognition. In the first part we present ”Motion History Volumes”, a free-viewpoint representation for human actions based on 3D visual-hull reconstructions computed form multiple calibrated, and backgroundsubtracted, video cameras. Results indicate that this representation can be used to learn and recognize basic human action classes, independently of gender, body size and viewpoint. We then present in the second part an approach based on a 3D exemplar-based HMM, which addresses the problem of recognizing actions from arbitrary views, even from a single camera. We will thus no longer require a 3D reconstruction during the recognition phase, instead we will use learned 3D models to produce 2D image information, which is compared to the observations. In the third and last part, we present a compact and efficient exemplar-based representation, which in particular does not attempt to encode the dynamics of an action through temporal dependencies. In experimental results we demonstrate that such a representation can precisely recognize actions, even with cluttered and non-background-segmented sequences.