Vérification de transformations de données arborescentes : les transducteurs d'arbres appliqués à un problème de vérification sur des scripts shell
Auteur / Autrice : | Paul D. Gallot |
Direction : | Sylvain Salvati, Aurélien Lemay |
Type : | Thèse de doctorat |
Discipline(s) : | Informatique et applications |
Date : | Soutenance le 16/12/2021 |
Etablissement(s) : | Université de Lille (2018-2021) |
Ecole(s) doctorale(s) : | École doctorale Mathématiques, sciences du numérique et de leurs interactions (Lille ; 2021-....) |
Partenaire(s) de recherche : | Laboratoire : Centre de Recherche en Informatique, Signal et Automatique de Lille |
Jury : | Président / Présidente : Sophie Tison |
Examinateurs / Examinatrices : Patrick Baillot, Sebastian Maneth | |
Rapporteur / Rapporteuse : Pierre-Alain Reynier, Sylvain Schmitz |
Résumé
This thesis aims at studying models for representing transformations of trees,elaborating algorithms for verification on such models and applying these toolsto the transformations performed by scripts on file hierarchies. We especiallyfocus on using techniques and models from the tree transducers literature, asopposed to formalisms based on logical formulas.In the Debian GNU/Linux distribution in particular, Shell scripts are a crucialpart of the installation, update and removal of software packages. The Shellscripting language provides access to Unix commands performing changes onfile systems in addition to other tools.We model Unix file systems as feature trees and we represent theactions of Unix commands on a file system using a model we call tree patterntransducers. This model uses tree patterns to represent modifications onfeature trees, and it uses a system of constraints to represent the domainsof tree transformations. We translate Unix commands into this model. We thenprovide an algorithm for computing the composition of tree pattern transducers.Our final goal is to use this representation of commands to detectconfigurations of the file system in which a given Shell script can fail.Instead of computing the transducers corresponding to Unix commands andcomposing them, we opt to only compute the inverse image (we view commands asfunctions from trees to trees) of the set of all feature trees through Unixcommands successively. We examine the pros and cons of this algorithm andwe implement it. To represent inverse images we use a variation of the system ofconstraints designed to model the domains of tree pattern transducers. Theimplementation is then tested on a corpus of Debian package scripts. To betterinform the discussion around this algorithm's complexity, we give a proof thatthe problem we are solving is NP-hard, even on very restricted sets of scripts.In a more theoretical direction, we use techniques from the field of functionnalprogramming to shed new light on known models of transducers. We contribute anew class of transducers we call High-Order Deterministic tree Transducers(HODT) which generalizes some known models of tree transducers. HODT aredefined similarly to Top-down tree transducers (DTOP), but the output of rulesare simply-typed lambda-terms. We show how putting constraints on theseterms yields different known classes of transducers: restriction to terms oforder 0 yields the class of DTOP, while restriction to terms of order at most1 yields the class of Macro Tree Transducers (MTT). We give a procedure forcomputing the composition of two HODT. We show that the order of the compositionis the sum of the orders of the composed transducers, which gives an interestingexplanation of why DTOP are closed under composition (as HODT of order 0)but not MTT (HODT of order 1).In particular we study the restriction of HODT to linear terms, to which we addan inspection by a regular bottom-up automaton. We show that this modelrepresents the same tree-to-tree functions as other known classes oftransducers, notably Transductionsdefined by Monadic Second-Order logic (MSOT). We then prove a similar result forthe restriction to almost linear terms and an extension of MSOT called MonadicSecond-Order logic tree Transductions with Sharing of subtrees (MSOTS).We then give a specialized procedure for composition of linear HODT whichpreserves linearity. This procedure relies onlinear logic and coherence spaces. Because the time complexity of this proceduredepends a lot on the order of transducers, we give a procedure that allows us toreduce the order of linear and almost linear transducers.As a last part, we prove that the word language MIX defined as the commutativeclosure of the Dyck language does not belong to the class of EDT0L languages northe class of non-branching Multiple Context-Free Languages.