Thèse soutenue

Une formulation supervisée de l'apprentissage par renforcement avec convergence superlinéaire

FR  |  
EN
Auteur / Autrice : Amit Parag
Direction : Nicolas Mansard
Type : Thèse de doctorat
Discipline(s) : Robotique et Informatique
Date : Soutenance le 20/04/2023
Etablissement(s) : Toulouse, INSA
Ecole(s) doctorale(s) : École Doctorale Systèmes (Toulouse)
Partenaire(s) de recherche : Laboratoire : LAAS - Laboratoire d'Analyse et d'Architecture des Systèmes - Laboratoire d'analyse et d'architecture des systèmes / LAAS
Jury : Président / Présidente : Vincent Padois
Examinateurs / Examinatrices : Olivier Stasse, Georgia Chalvatzaki, Olivier Sigaud
Rapporteurs / Rapporteuses : Vincent Padois, Manuel Lopes

Résumé

FR  |  
EN

Deep reinforcement learning uses simulators as abstract oracles to interact with the environment. In continuous domains of multi-body robotic systems, differentiable simulators have recently been proposed, still, they are yet underutilized, even though we have the knowledge to make them produce richer information. This problem when juxtaposed with the usually high computational cost of exploration-exploitation in high dimensional state space can quickly render reinforcement learning algorithms less effective. In this thesis, we propose to combine learning and simulators such that the quality of both increases while the need to exhaustively search the state space decreases. We propose to learn value function and state, control trajectories through locally optimal runs of a trajectory optimizer. The learned value function, along with estimates of optimal state and control policies, is subsequently used in the trajectory optimizer : the value function estimate serves as a proxy for shortening the preview horizon, while the state and control approximations serve as a guide in policy search for our trajectory optimizer. The proposed approach demonstrates a better symbiotic relation, with superlinear convergence, between learning and simulators, that we need for end-to-end learning of complex polyarticulated systems.