Thèse soutenue

Modélisation et fouille de discussions de Web

FR  |  
EN
Auteur / Autrice : Anna Stavrianou
Direction : Julien VelcinJean-Hugues Chauchat
Type : Thèse de doctorat
Discipline(s) : Informatique
Date : Soutenance le 01/02/2010
Etablissement(s) : Lyon 2
Ecole(s) doctorale(s) : École doctorale InfoMaths (Lyon ; 2009-....)
Partenaire(s) de recherche : Laboratoire : Equipe de Recherche en Ingénierie des Connaissances
Jury : Président / Présidente : Stefan Trausan-Matu
Examinateurs / Examinatrices : Pascal Poncelet, Jean-Gabriel Ganascia, Marc El-Bèze

Résumé

FR  |  
EN

The development of Web 2.0 has resulted in the generation of a vast amount of online discussions. Mining and extracting quality knowledge from online discussions is significant for the industrial and marketing sector, as well as for e-commerce applications. Discussions of this kind encapsulate people's interests and beliefs and hence, there is a great interest in acquiring and developing online discussion analysis tools. The objective of this thesis is to define a model which represents online discussions and facilitates their analysis. We propose a graph-oriented model. The vertices of the graph represent postings. Each posting encapsulates information such as the content of the message, the author who has written it, the opinion polarity of the message and the time that the message was posted. The edges among the postings point out a ''reply-to'' relation. In other words they show which posting replies to what as it is given by the structure of the online discussion.The proposed model is accompanied by a number of measures which facilitate the discussion mining and the extraction of knowledge from it. Defined measures consist in measures that are underlined by the structure of the discussion and the way the postings are linked to each other. There are opinion-oriented measures which deal with the opinion evolution within a discussion. Time-oriented measures exploit the presence of the temporal dimension within a model, while topic-oriented measures can be used in order to measure the presence of topics within a discussion. The user's presence inside the online discussions can be exploited either by social network techniques or through the new model which encapsulates knowledge about the author of each posting.The representation of an online discussion in the proposed way allows a user to ''zoom'' inside the discussion. A recommendation of messages is proposed to the user to enable a more efficient participation inside the discussion.Additionally, a prototype system has been implemented which allows the user to mine online discussions by selecting a subset of postings and browse through them efficiently.