• Save
Sentiment Analysis for the Italian Language - Presentation
Upcoming SlideShare
Loading in...5
×
 

Sentiment Analysis for the Italian Language - Presentation

on

  • 3,067 views

The PhD thesis of Dr. Paolo Casoto on Sentiment Analysis. The work presented in this thesis provides several contributions to the specific task of Sentiment Analysis applied, more specifically, to ...

The PhD thesis of Dr. Paolo Casoto on Sentiment Analysis. The work presented in this thesis provides several contributions to the specific task of Sentiment Analysis applied, more specifically, to product reviews written in Italian language. In particular the following contributions have been proposed:
• a generic framework aimed at defining, training and testing automatic tools devoted to Sentiment Analysis based on supervised classifiers has been designed and implemented. The SENT-IT framework provides a complete set of integrated tools for linguistic analysis and machine learning, which could be applied in order to easily generate new automatic tools for sentiment classification and to evaluate experimentally their performances. A comprehensive description of the SENT-IT framework and its modules is provided in Chapter 3. SENT-IT framework is based on open-source solutions and will be freely released soon for research purposes.
• a set of automatically annotated corpora constituted by product reviews writ- ten in Italian language, grouped by product domain (e.g.: movie, cars, cell phones, et al.), has been collected and shared with other researchers. Each product review is constituted by a short text, a set of additional and optional information, such as date, author name and age, and an overall polarity rating indicator, aimed at representing the polarity expressed by the author within the review. Corpora which have been developed in order to perform evaluation of the proposed methodologies for Sentiment Analysis, could be used in the future by other researchers as a Gold Standard, not available for the Italian language until the beginning of this thesis. Review corpora have been publicly released in 2008 in XML format and are available at author’s site.
• a document features representation schema suitable for Sentiment Analysis applied to Italian language has been proposed and experimentally evaluated. The set of selected features, described in detail in Chapter 3, is constituted by representation features described as suitable in literature, in the case of English language, and ad-hoc defined features, proposed according with the specific particularities of the Italian language.
• a domain independent meta-classifier devoted to Sentiment Analysis has been implement by applying a stacking approach to previously trained domain-dependent classifiers. Stacking approach has been investigated in order to improve the effectiveness of the ensemble classifier on unknown or already known domains.
• a lexical resource of polarity oriented terms for the Italian language has been developed, by proposing a shortest path algorithm based on a graph representation of the input terms. Semantic relations connecting terms, like synonymy,
antinomy and similarity have been used in order to generate the graph representation.

Statistics

Views

Total Views
3,067
Views on SlideShare
2,716
Embed Views
351

Actions

Likes
2
Downloads
0
Comments
1

12 Embeds 351

http://caspaolo.blogspot.it 269
http://caspaolo.blogspot.com 60
http://caspaolo.blogspot.tw 7
http://caspaolo.blogspot.co.uk 3
http://www.mybestcv.co.il 3
http://caspaolo.blogspot.de 2
https://twitter.com 2
http://caspaolo.blogspot.in 1
http://caspaolo.blogspot.fr 1
http://caspaolo.blogspot.co.il 1
http://caspaolo.blogspot.ch 1
http://caspaolo.blogspot.co.at 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sentiment Analysis for the Italian Language - Presentation Sentiment Analysis for the Italian Language - Presentation Presentation Transcript

  • Sentiment Analysis for theItalian Language dott. Paolo Casoto Sentiment Analysis for the Italian Language Relator:professor Carlo Tasso dott. Paolo Casoto Relator: professor Carlo Tasso Department of Mathematics and Computer Science - University of Udine 3rd March 2011
  • Summary Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Introduction A Supervised Approach to Overall Opinion Polarity Analysis Domain Independent Sentiment Analysis Automatic Generation of Lexical Resources for Sentiment Analysis Conclusions
  • Subjectivity and Sentiment in written texts Sentiment Analysis for theItalian Language dott. Paolo Casoto Subjectivity Relator:professor Carlo Set of opinions, emotions, thoughts et. al. expressed in a textual Tasso document; Overview of the writer’s internal private state; From a User Modeling point of view: a way user implicitly shows his/her personal traits; Sentiment A type of subjectivity; Set of positive and negative emotions, evaluations, and judgments; Attention focused on the polarity of expressed subjectivity
  • Sentiment Analysis Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Often referred as Opinion Polarity Analysis; Aims at identifying the polarity expressed by an author on a given document; May work at several different textual levels: 1 document; 2 phrases; 3 sentences. When applied to the overall polarity expressed by a document is referred as Overall Opinion Polarity Analysis (OvOPA).
  • Limitations of Sentiment Analysis Sentiment Analysis for theItalian Language Harder than topic classification or generalization dott. Paolo Casoto Polarity cannot be effectively exploited just looking at keywords; Relator:professor Carlo Tasso Sentiment tends to be expressed in more subtle ways (e.g. irony, negation, use of adverbs, et. al.). Example Is not very good a clue of positive, negative or neutral polarity? And is its polarity equal to a bit bad? Domain dependency: some terms and sentences may change their polarity depending on the context in which they are used. Example Is the sentence this is a real tragedy a clue of positive, negative or neutral polarity when referring to the movie The Godfather? And is its polarity equal when referring to a text on cars’ brakes?
  • Limitations of Sentiment Analysis/2 Sentiment Analysis for theItalian Language dott. Paolo Casoto Time dependency: some terms and sentences may change their Relator:professor Carlo polarity over time. Tasso Example Is the term Dual Band a clue of positive, negative or neutral polarity when referring to a brand-new mobile phone? Does its polarity change in respect with a mobile built in 1998? The parts as the whole problem, related with many particular domains, such the movie domain; Difficulties in ranking positive and negative polarity of analyzed texts. Moving towards from classification (two labels) to regression (numerical values).
  • Fields of Application Sentiment Analysis for theItalian Language dott. Paolo Business Intelligence: analysis of the customers’ opinion allows Casoto Relator: extracting information useful for strategic marketing, brandprofessor Carlo Tasso monitoring and advertising. Example Cars’ producer A is interested in tracking the opinions expressed by its customers with respect to a specific model. In particular A wants to know how customers react to its brand-new brake system. Political Campaigns; Financial Investments Planning ; Terrorism prevention; Oriented Search Engines: retrieve documents relavant with respect to a query and a specific polarity expressed by the user.
  • A Supervised Approach to Overall Opinion Polarity Analysis Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Goalprofessor Carlo Tasso To define a supervised approach aimed for overall opinion polarity (OvOP) analysis of documents written in Italian language. Following activities have been performed: a novel set of linguistic features has been proposed. a flexible and reusable software platform aimed at implementing and evaluating the proposed methodologies has been designed and developed: the SENT-IT framework; a corpus of labelled product reviews has been collected; More than 300 experimental evaluations have been performed.
  • SENT-IT framework / Modules Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Figura: Overall architecture of the SENT-IT framework.
  • Baseline evaluation Sentiment Analysis for theItalian Language dott. Paolo Documents are represented with respect to the following features: Casoto Relator: 1 the frequency of the 2122 most frequent unigrams UF;professor Carlo Tasso 2 the occurency of the 2122 most frequent unigrams U; 3 the occurency of the 2122 most frequent bi-grams B; 4 the occurency of the 2122 most frequent bi-grams merged with the most frequent 2122 unigrams UB; 5 the occurency of the bi-grams composed by one of the most frequent 2122 unigrams and the POS tag associated with such unigram POS; 6 the occurency evaluated only on the unigrams tagged as adjectives by the POS tagger Adj. 7 the occurency evaluated on the 1000 most frequent unigrams appearing in the training set U1000.
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Tabella: Average accuracy of the built classifiers Acc. (no stem.) Acc. (stem.) Acc. Pang UF 75.5 76.4 78,7 U 81 82.1 81 B 68.6 70.1 77,3 UB 79,2 79,9 80,6 POS 77.1 x 81,5 Adj 72.7 76.1 77 U1000 80.3 81.6 (U2633) 80.3
  • Introduction of new features Sentiment Analysis for theItalian Language Documents are represented with respect to the following features: dott. Paolo Casoto 1 the occurency of the unigrams occurring at least 3 times in the Relator:professor Carlo training set U3; Tasso 2 the occurency of the unigrams and bigrams occurring at least 3 times in the training set UB3; 3 the occurency of the unigrams, bigrams and trigrams occurring at least 3 times in the training set UBT3; In addiction with the 3 proposed feature models following features are extracted from each document of the training set: 1 number of question and exclamation marks; 2 number of sentences; 3 number of long words (7 characters or more); 4 average sentence length; 5 average word length.
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Tabella: Average accuracy of the six classifiersprofessor Carlo Tasso U3 UB3 UBT3 Na¨ Bayes ıve 82.2 82.4 82.5 SVM 84.9 84.4 84.4 Tabella: Average accuracy+ and accuracy− of the six classifiers U3 UB3 UBT3 acc+ acc− acc+ acc− acc+ acc− NB 85.4 79.0 86.2 78.6 86.2 78.8 SVM 85.2 84.6 83.8 85.0 83.6 85.2
  • Feature selection Sentiment Analysis for theItalian Language Feature selection aims at identifying the subset of features, which are dott. Paolo more useful in assigning a set of documents to a group of classes. Casoto Relator:professor Carlo Tasso Reduce the noise introduced by sparsity of data,increasing performance and reducing computational costs. The Information Gain (IG) feature selection metric is defined, with respect to the input feature t, as: m m IG (t) = − Pr (ci )logPr (ci ) + Pr (t) Pr (ci |t)logPr (ci |t) i=1 i=1 m + Pr ( t) Pr (ci | t)logPr (ci | t) (1) i=1 m with {ci }i=1 set of avaiable classes. Features t are ranked accordingly with the IG (t) value. Only the n best features are used.
  • Results Sentiment Analysis for theItalian Language Tabella: Average accuracy of the U3 and UBT3 based classifiers after dott. Paolo Casoto feature selection Relator:professor Carlo Tasso U3 50 features 100 features 250 features Na¨ Bayes ıve 81.0 83.8 83.8 SVM 83.3 86.2 85.5 500 features 1K features 2K features Na¨ Bayes ıve 85.6 86.7 85.4 SVM 86.8 87.5 85.7 UBT3 100 features 250 features 500 features Na¨ Bayes ıve 84.4 84.5 85.4 SVM 85.5 85.9 87.2 1K features 2K features 3K features Na¨ Bayes ıve 86.6 86.8 85.5 SVM 87.9 89.0 87.8
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Tabella: Top 50 features extracted from the training set with the highestprofessor Carlo Tasso IG value. bellissim bravissim splendid bel eccezional brutt p`o attor favol depp bell piac pir harry grindhous jack noios grand johnny noi pessim ridicol perfett colonn stup ottim interpret sparrow inutil insuls fantast bast sonor straordinar ’ottim film’ evit simpson orrend ’colonn sonor’ stupid peggior butt will ’film bellissim’ jones delusion ’jack sparrow’ schifezz brav molt
  • Analysis of the results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Occurrence of unigrams leads to the best performances; SVM classifiers clearly over perform NB classifiers; NB classifiers tend to classify positive reviews better than negative ones, SVM classifiers tend to be more fair; Feature selection improves accuracy between 2% and 4.6%. Accuracy achieved by the classifiers trained on the selected set of features is comparable with the best results described in literature for the English language (90,2% Aue Gamon).
  • Domain Independent Sentiment Analysis Sentiment Analysis for theItalian Language The issue dott. Paolo Casoto Relator: How does the proposed approach work on documents not concerningprofessor Carlo Tasso the specific domain used in training ? Several approaches have been proposed in literature in order to deal with the problem of training a domain independent OvOP classifier, with no a significant improvements. Following activities have been performed: 1 four domain dependent classifiers are trained, each on a different domain D using the UBT3 feature set, and cross-evaluated; 2 a general purpose classifier, trained on multiple domains, has been created and evaluated; 3 different domain dependent OvOP classifiers are combined in ensemble to generate a meta-classifier.
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Tabella: Average three-fold cross-validation accuracies for each domain dependent OvOP classifier trained according with the UBT3 feature set. Movie Cell Car Book NB 86,8 77,79 77 75,69 SVM 89,0 88,84 82,11 80,49 Poor performance of the OvOP classification process in the book domain is related with the length and the content of the reviews: most of the content describes the plot of the book and acts as noise in document representation.
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Tabella: Top 10 features extracted from each training set with the highestprofessor Carlo Tasso IG value. Cell Car Book ottim diesel pirandell gioc ford fu impost daewo sconsigl pretes comod padron fotograf nuov sess consent 750 costitu film ottim don ’macchin fotograf’ graz quegl vocal buon giusepp mal ril social
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Tabella: Average three-fold cross-validation accuracies for each domain Relator: dependent OvOP classifier applied to different domains.professor Carlo Tasso NB Movie Cell Car Book Movie 86,8 75 60,68 63,66 Cell 71,5 77,79 57,87 58,08 Car 69,6 71,92 77 59,87 Book 68,7 71,06 58,1 75,69 SVM Movie Cell Car Book Movie 89 74,32 63,69 63,44 Cell 70,9 88,84 62,39 57,74 Car 67,7 76,92 82,11 59,31 Book 70,8 75,29 64,39 80,49
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Tabella: Classification accuracy of a classifier trained on three domains and tested on the forth domain. Training Testing NB SVM Cell, Car, Book Movie 72,8 71 Movie, Car, Book Cell 76,82 76,53 Movie, Cell, Book Car 54,46 57,67 Cell, Car, Movie Book 60,75 60,31
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Tabella: Classification accuracy of a classifier trained on the four domains. Testing NB SVM Movie 78,7 76,9 Cell 71,92 79,42 Car 65,6 69,3 Book 72,46 69,45
  • The meta-classification OvOP process / 1 Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Last experimental activity is related with the training and testingprofessor Carlo Tasso of a meta-classifier. The meta-classifier is able to perform OvOP classification on new domains, based on a small amount of labelled data from the target domain. Four different meta-classifiers have been trained, in order to deal with the different combinations of training and target domains 250, 500 and 1000 labelled documents from the target domain have been used for training. Domain dependent classifiers have been trained using NB as learning method, while the meta-classifier is based on SVM.
  • The meta-classification OvOP process / 2 Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Figura: The meta-classification OvOP process.
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Tabella: Classification accuracy of a meta-classifier evaluated on the four domains. Target 250 500 1000 Movie 84,0 81,4 80,0 Cell 80,63 80,63 83,23 Car 74,85 75,04 74,85 Book 75,85 75,45 75,05
  • Analysis of the Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: The results clearly show how OvOP meta-classifier performs betterprofessor Carlo Tasso than the general purpose classifier previously described, on each analyzed domain, providing an accuracy ranging from 75,04% for the car reviews to 84,0% for movie reviews. By comparing the results we obtained by means of the OvOP meta-classifier with the baseline it clearly emerges that the improvement in accuracy is similar across different domains. Meta-classification process, based on domain based classifiers and the UBT3 feature set, performs relatively well even on inter-domain OvOP classification process.
  • Automatic Generation of Lexical Resources for Sentiment Analysis Sentiment Analysis for theItalian Language Idea dott. Paolo Casoto Define a novel approach for determining the orientation of a set of Relator:professor Carlo adjectives of the Italian language starting from a set of manually Tasso classified seeds and a lexical resource (e.g.: dictionary, lexicon) with a graph-like structure. Shortest Path For each node the set of shortest paths connecting with a list of seed nodes is evaluated. Improvement and refining of the ideas exploited by Matteo Borsari; Several kinds of relationships connecting terms (synonymy, antinomy, weak synonymy and weak antinomy); Decay of the contribution provided by each seed term is introduced.
  • Determining the polarity orientation Sentiment Analysis for theItalian Language dott. Paolo The function decay (i, j) is used to calculate the orientation and the Casoto Relator: strength of the contribution of the edge v ∈ ts .professor Carlo Tasso   +1 if i and j are conn. by a synonymy rel.  −1 if i and j are conn. by a antinomy rel.  decay (i, j) =  +0, 8 if i and j are conn. by a weak syn. rel.  −0, 8 if i and j are conn. by a weak ant. rel.  (2) The function d(ts ) evaluates the overall decay over the path ts = {t, . . . , s} connecting term t with seed s. s d(ts ) = decay (i, i + 1) (3) i=1 d(ts ) = 0 when there is no path connecting term t with the seed s
  • Determining the polarity orientation / 2 Sentiment Analysis for the Semantic similarity between a term t and the list of positive seedItalian Language terms is defined as: dott. Paolo Casoto Relator: s∈S +O(s) × d(ts )professor Carlo m+ (t) = (4) Tasso s∈S + O(s) where O(s) represents the polarity manually assigned to each term constituting the seed list S + . Semantic similarity between a term t and the list of negative seed terms is defined, similarly, as: s∈S −O(s) × d(ts ) m− (t) = (5) s∈S − O(s) The orientation of a term t with respect to the list of positive and negative seed terms S + S − could be calculated as follows: O(t) = m+ (t) + m− (t) (6) A term t is classified as positive if O(t) > 0, otherwise it is classified as negative.
  • Determining the polarity orientation / 3 Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Same decay value is assigned to each relationships: the contributionprofessor Carlo Tasso provided by the seed s to the term t is reduced by the same amount independently from the kind of edges constituting the path ts . The Ovariant (t) is defined as: s∈S + O(s) × sign(d(ts )) × k |ts |−1 Ovariant (t) = s∈S + O(s) s∈S − O(s) × sign(d(ts )) × k |ts |−1 + (7) s∈S − O(s) Function sign(d(ts )) controls switches in polarity due to edges connecting opposite oriented terms.
  • Lexical resources Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso The OpenOffice dictionary (D1) for the Italian language: 25372 lemmas / 8941 adjectives, connected by synonymy; The SinonimiMaster dictionary (D2) : 53949 lemmas / 8888 adjectives, connected by synonymy, antinomy, weak synonymy, and weak antinomy. S + = {buono, bello, eccellente, positivo, fortunato, corretto, superiore} S − = {cattivo, brutto, povero, negativo, sfortunato, errato, inferiore}
  • Test set Sentiment Analysis for theItalian Language dott. Paolo Casoto A collection (L1) of 248 orientation-bearing adjectives (118 Relator:professor Carlo positive, 130 negative) has been manually collected. Eight users Tasso have been involved, 53 adjectives (L1+ ) (28 positive and 25 negative) have been provided by two or more annotators. A collection (L2) of 280 orientation-bearing adjectives, randomly extracted by D1 D2, has been generated and manually annotated by four different annotators with an inter-agreement up to 82,7 %. The prior orientation of each seed O(s) is set to 1 ∀s ∈ S + and -1 ∀s ∈ S − . A test set (T) constituted by 7856 pre-classified documents from four different domains equally distributed between positives and negatives.
  • Results: OpenOffice dictionary Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Tabella: Positive and negative adjectives with the highest orientation valueprofessor Carlo O(t) generated from the OpenOffice dictionary. Tasso Adjective+ Polarity Adjective− Polarity inappuntabile 0,734 inopportuno -0,69651132 favorevole 0,727 inadatto -0,692204375 esatto 0,707 inabile -0,691089402 dabbene 0,705 disgustoso -0,687652757 eccellente 0,687 nauseabondo -0,665446446 delizioso 0,686 inadeguato -0,662615889 benevole 0,684 incapace -0,662291163 giusto 0,678 inidoneo -0,661069463 costruttivo 0,677 malaugurato -0,65419271 vantaggioso 0,675 turbolento -0,65419271
  • Results: SinonimiMaster dictionary Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator: Tabella: Positive and negative adjectives with the highest orientation valueprofessor Carlo O(t) generated from the SinonimiMaster dictionary. Tasso Adjective+ Polarity Adjective− Polarity stimabile 0,629307968 schifoso -0,57693726 salutare 0,610773366 sfavorevole -0,573704163 splendido 0,610773366 storto -0,557707135 squisito 0,610773366 nauseabondo -0,545980246 eccellente 0,610773366 inidoneo -0,545980246 valido 0,606472043 inabile -0,545980246 divertente 0,589220222 inefficiente -0,545980246 benfatto 0,589220222 sbagliato -0,543614796 salubre 0,58510436 maldestro -0,533311282 composto 0,58510436 incompetente -0,533311282
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Tabella: Coverage and accuracy of both generated sentiment-classified Relator:professor Carlo lexical resources with respect to test set L1 and L2 Tasso Dictionary Coverage L1+ Accuracy L1+ OpenOffice 98,11% 94,34% SinonimiMaster 100% 92,45% Dictionary Coverage L1 Accuracy L1 OpenOffice 96,37% 83,87% SinonimiMaster 96,37% 80,24% Dictionary Accuracy L2 O(t) on OpenOffice 89,29% Ovariant (t) on OpenOffice 88,92% O(t) on SinonimiMaster 91,07% O(t)variant on SinonimiMaster 90,86%
  • OvOP analysis based on sentiment oriented terms Sentiment Analysis for theItalian Language Idea dott. Paolo The OvOP of a document d is the algebraic sum of the prior Casoto Relator: subjectivity status of each adjective appearing in the document.professor Carlo Tasso Sentences are extracted from a given document by means of a set of extraction rules based on Part-Of-Speech tagging; Valence Shifters (adverbs and negations) are used to contextualize the prior polarity of a term; For each sentence s the Opinion Polarity score is calculated as: OP(s) = O(adjs ) × Int(s) × Neg (s) (8) The OvOP score is calculated as the sum of the OP score of each extracted sentence s constituting the document: OvOP(d) = OP(s) (9) s∈Sd where Sd is the set of sentences extracted from document d.
  • Results Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Tabella: Accuracy of lexical resource based OvOP analysis. ADJ NEG+ADJ ADV+ADJ NEG+ADV+ADJ O(t)/OO 62,88% 64,71% 64,9% 64,8% Ovar . (t)/OO 62,9% 64,23% 64,47% 63,9% O(t)/SM 63,37% 65,63% 65,25% 65,22% O(t)var . /SM 63,52% 66,94% 65,34% 66,21%
  • Conclusions / 1 Sentiment Analysis for theItalian Language dott. Paolo Casoto A supervised algorithm and a set of representation features Relator:professor Carlo aimed at determining the Overall Opinion Polarity of a product Tasso review have been proposed and evaluated, providing an accuracy, in the best case, of 89%; Domain dependency of OvOP Analysis has been faced by implementing and experimenting a meta-classifier based on ensemble methodologies; High accuracy domain dependent classifiers have been used to train four different domain dependent OvOP classifiers, each of them granting an accuracy higher than 80%; Four different corpora constituted by product reviews on different domains have been collected; they represent the Gold standard for OvOP Analysis for the Italian language;
  • Conclusions / 2 Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Two lexical resources, represented as a graphs, have been used to generate, in an unsupervised way, four different sets on opinionated adjectives; A novel domain independent OvOP analysis methodology based on the prior status subjectivity of the automatically classified adjectives has been tested. Results have proven that accuracy is significantly lower with respect to both the domain dependent and the domain independent classifiers based on machine learning.
  • List of publications / 1 Sentiment Analysis for theItalian Language P. Casoto, C. Tasso. An Hybrid Approach for Improving Word dott. Paolo Casoto Sense Disambiguation and Text Clustering. In Proceedings of Relator:professor Carlo the 2nd Italian Research Conference on Digital Library Tasso Management Systems, Padua, Italy, 29-30 January 2007, pp.105-110. P. Casoto, A. Dattolo, P. Omero, N. Pudota, C. Tasso. A New Machine Learning Based Approach for Sentiment Classification of Italian documents. In Proceedings of the 3rd Italian Research Conference on Digital Library Management Systems Padua, Italy, 24-25, January 2008, pp. 77-82. N. Pudota, P. Casoto, A. Dattolo, P. Omero, C. Tasso. Towards Bridging the Gap between Personalization and Information Extraction. In Proceedings of the 3rd Italian Research Conference on Digital Library Management Systems Padua, Italy, 24-25, January 2008, pp. 33-40.
  • List of publications / 2 Sentiment Analysis for theItalian Language P. Casoto, A. Dattolo, F. Ferrara, P. Omero, N. Pudota, C. dott. Paolo Tasso. Generating and sharing personal information spaces. In Casoto Relator: Proceedings of Adaptive Hypermedia and Adaptive Web-Basedprofessor Carlo Tasso Systems: Adaptation for the Social Web Workshop Hannover, Germany, 2008, pp. 14-23. P. Casoto, A. Dattolo, C. Tasso. Sentiment Classification for the Italian Language: a Case Study on Movie Reviews. In Journal of Internet Technology, Vol 9(4), ISSN 1607-9264. A. Baruzzo, P. Casoto. A Flexible Service-Oriented Digital Platform for e-Content Management in Cultural Heritage. In Proceedings of IABC Workshop - Intelligenza Artificiale nei Beni Culturali Cagliari, Italy, 11 September 2008, pp. 38-45. A. Baruzzo, P. Casoto, P. Challapalli, A. Dattolo. An Intelligent Service Oriented Approach for Improving Information Access in Cultural Heritage. In Proceedings of Information Access in Cultural Heritage Workshop - ECDL 2008 Aarhus, Denmark, 18 September 2008, ISBN 978-90-813489-1-1.
  • List of publications / 3 Sentiment Analysis for theItalian Language dott. Paolo Casoto A. Baruzzo, P. Casoto, A. Dattolo, C. Tasso. Handling Relator:professor Carlo Evolution in Digital Libraries. In Proceedings of the 5th Italian Tasso Research Conference on Digital Library Management Systems, Padua, Italy, 29-30 January 2009, pp- 34-50. A. Baruzzo, P. Casoto, A. Dattolo. A Conceptual Model for Digital Libraries Evolution. In Proceedings of the 5th Web Information Systems and technologies WEBIST 2009 Lisboa, Portugal, 23-26 March 2009, pp. 299-304, ISBN 978-989-8111-81-4. A. Baruzzo, P. Casoto, P. Challapalli, A. Dattolo, N. Pudota, C. Tasso. Toward Semantic Digital Libraries: Exploiting Web 2.0 and Semantic Services in Cultural Heritage. In Journal of Digital Information, Vol 10(6), ISSN 1368-7506.
  • List of publications / 4 Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso P. Casoto, A. Dattolo, P. Omero, N. Pudota, C. Tasso. Accessing, Analyzing, and Extracting Information from User Generated Contents. In Handbook of Research on Web 2.0, 3.0, and X.0: Technologies, Business, and Social Applications, edited by San Murugesan, IGI Global (Information Science Reference), USA, 2010, ISBN 978-160-5663-84-5, ISBN10 1605663840.
  • Question Time Sentiment Analysis for theItalian Language dott. Paolo Casoto Relator:professor Carlo Tasso Thank you very much