The PhD thesis of Dr. Paolo Casoto on Sentiment Analysis. The work presented in this thesis provides several contributions to the specific task of Sentiment Analysis applied, more specifically, to product reviews written in Italian language. In particular the following contributions have been proposed:
• a generic framework aimed at defining, training and testing automatic tools devoted to Sentiment Analysis based on supervised classifiers has been designed and implemented. The SENT-IT framework provides a complete set of integrated tools for linguistic analysis and machine learning, which could be applied in order to easily generate new automatic tools for sentiment classification and to evaluate experimentally their performances. A comprehensive description of the SENT-IT framework and its modules is provided in Chapter 3. SENT-IT framework is based on open-source solutions and will be freely released soon for research purposes.
• a set of automatically annotated corpora constituted by product reviews writ- ten in Italian language, grouped by product domain (e.g.: movie, cars, cell phones, et al.), has been collected and shared with other researchers. Each product review is constituted by a short text, a set of additional and optional information, such as date, author name and age, and an overall polarity rating indicator, aimed at representing the polarity expressed by the author within the review. Corpora which have been developed in order to perform evaluation of the proposed methodologies for Sentiment Analysis, could be used in the future by other researchers as a Gold Standard, not available for the Italian language until the beginning of this thesis. Review corpora have been publicly released in 2008 in XML format and are available at author’s site.
• a document features representation schema suitable for Sentiment Analysis applied to Italian language has been proposed and experimentally evaluated. The set of selected features, described in detail in Chapter 3, is constituted by representation features described as suitable in literature, in the case of English language, and ad-hoc defined features, proposed according with the specific particularities of the Italian language.
• a domain independent meta-classifier devoted to Sentiment Analysis has been implement by applying a stacking approach to previously trained domain-dependent classifiers. Stacking approach has been investigated in order to improve the effectiveness of the ensemble classifier on unknown or already known domains.
• a lexical resource of polarity oriented terms for the Italian language has been developed, by proposing a shortest path algorithm based on a graph representation of the input terms. Semantic relations connecting terms, like synonymy,
antinomy and similarity have been used in order to generate the graph representation.