Thèse de doctorat en Informatique
Sous la direction de Yves Kodratoff.
Soutenue en 1988
à Paris 11 , en partenariat avec Université de Paris-Sud. Faculté des sciences d'Orsay (Essonne) (autre partenaire) .
Pas de résumé disponible.
Integration of numeric and symbolic techniques in learning
In this dissertation, we present a learning algorithm that uses a schema - based (frames) representation language. From a set of training examples, the system produces either a decision tree, or two sets of concept descriptions : Most general description (minimaly discriminant rules) ; Most specific description (maximally specific rules). These bear sorne similarities with the S and G sets in the version space theory. However, the target concept may be disjunctive (we discuss the limiits of Mitchell's candidate elimination algorithm with respect to learning disjunctive concepts). In order to generate the decision tree, the learning system uses a top-down, hill climbing search strategy. Like in the ID₃system, the heuristic preference criterion is based on information theory and on the entropy measure (numeric learning). The training examples are represented using an object-based formalism and the learning system can use sophisticated general knowledge about the domain (symbolic learning). Frames are used to perform inheritance (top - down propagation of information) and also generalization (bottom up propagation), which lead us in developing our own frame language. Because the training examples now carry so much information, the search space becomes drastically increased. When building the decision tree, a static table of features cannot be used to generate candidate nodes and compute information gain to the ID₃(for each remaining feature). The system thus dynamically generates candidate nodes when building the decision tree, Two procedures to generate these candidate nodes have been implemented. The first one sticks as much as possible to ID₃and traverse the hierarchy of frames (background knowledge) in a general - to - specific manner. The second one is based on the AQ, seed-driven strategy. A comparison of these two metbods is given using the same training set (in previous attempts to compare ID₃ and AQ 11, two different databases examples were used). The system has been applied to the generation of a knowledge base in tomato plant pathology (350 examples described by 213 frames). It has been equiped with several techniques (numeric and symbolic) to detect and treat noise in the training examples. The learning system is efficient, uses background knowledge and can learn several concepts (classes) in a single learning cycle. This work demonstrates that inductive learning techniques have now reached the state of maturity where they can be used as a knowledge acquisition front - end for the most popular “object-based" expert - system shells.