SlideShare a Scribd company logo
1 of 29
Download to read offline
Discovering and monitoring product features 
and the opinions on them with the 
OPINSTREAM framework 
Max Zimmermann1, Eirini Ntoutsi2, 
Myra Spiliopoulou1 
Hellenic Data Management Symposioum 
July 2014 
1 2 Institute for Informatics, Ludwig‐ 
Maximilians‐Universität (LMU) 
München, Germany
o Opinions from TripAdvisor on Vienna Marriott Hotel 
2/5/2014: Great hotel, very nice rooms, perfect location, very nice staffs except for a mid‐aged 
female receptionist who tried to charge me extra for wifi fees when checking out. It was waived 
at the desk when I checked‐in. And she started treating me with an attitude after she found out 
that I got a great deal through priceline.com. …. 
26/1/2014: Spent a long weekend here. Rooms clean and functional without being spectacular 
and a nice pool etc. Staff in pool weren't Good and I found them actually quite rood. Executive 
lounge was ok and not busy but selection of wine and beer wasn't great. The reception has 
many shops and a bar at the end which kind of males it feel like a shopping centre. Overall great 
for business travel but not sure id come again for leisure. 
7/5/2013: The Vienna Marriott has all you expect; no frills, but solid service and they get all the 
basic stuff done right. 
It's in a fine location, maybe 10 minute walk from the major city attractions while being in a 
quiet area. Breakfast buffet exceptional and good fitness center. Very helpful and happy staff. 
Lobby lounge just okay. Not a good wine selection and the Sinatra‐like singer adds nothing. 
Maybe just a little more expensive than it should be, too. 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 2
• Opinionated streams 
• Opinion stream mining & the OPINSTREAM framework 
• Extracting an initial opinionated hierarchy of product (sub)features 
• Online (sub)feature hierarchy maintenance 
• Online opinion classifier maintenance 
• Experiments 
• Summary 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 3
o In a static setting, i.e., given is a collection of opinionated documents: 
• How to extract the different, ad hoc aspects discussed in the collection? 
• How to associate a dominant polarity to each aspect? 
o When the set becomes a stream 
• How to extract the ad hoc aspects over time? 
• How to update their sentiment over time? 
o The OPINSTREAM framework 
• Feature discovery  adaptive unsupervised learning (clustering) 
 Features are defined as cluster centroids 
 Features evolve over time 
• Polarity learning  adaptive semi‐supervised learning (classification) 
 Majority voting within the cluster for the polarity 
 Learning under concept drift 
 Learn with a small amount of labeled reviews 
Polarized aspect 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
4
o Reviews arrive at distinct timepoints t0, t1, …., ti, …. in batches of fixed size. 
o We don’t receive class labels from the stream; the only sentiment‐related 
information we assume is an initial seed set S of labeled reviews. 
o Reviews are subject to ageing according to the exponential ageing function 
• Recent reviews are more important for learning 
• Gradual decrease of the weight of a review over time 
• Ageing affects both clustering (feature extraction) and classification (sentiment learning) 
o The importance of a review r with respect to a set of reviews R, is defined as the 
number of reviews in R that have r among their kNN, also considering age. 
• For the kNN, cosine similarity is employed 
o Important reviews Rβ: those exceeding the review importance threshold β. 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 5
• Opinionated streams 
• Opinion stream mining & the OPINSTREAM framework 
• Extracting an initial opinionated hierarchy of product (sub)features 
• Online (sub)feature hierarchy maintenance 
• Online opinion classifier maintenance 
• Experiments 
• Summary 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 6
Initial seed set S (labeled) 
Extract clusters 
battery, 
weight, 
life, … 
picture, 
quality, 
size, … 
Global/ 1st level clusters 
Clustering process 
• Extract important reviews R from S. 
• Define dimensions F as all nouns in R 
• Vectorize S over F through TFIDF. 
• Apply fuzzy c‐means to derive the 1st level 
Label k 
Cluster 1: 
battery 
Cluster 2: 
picture 
Cluster k: 
… 
clusters. 
… 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
7
Initial seed set S 
(labeled) 
Extract clusters 
battery, 
weight, 
life, … 
picture, 
quality, 
size, … 
Cluster 1: 
battery 
Cluster 2: 
picture 
… 
Extract subclusters 
weight life quality size 
Local/ 2nd level clusters 
Similar to 1st level clustering, 
the dataset though is the 
population of each cluster 
… … 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
subCluster 11: 
battery weight 
Clustering process 
• Extract important reviews R from 
cluster c. 
• Define dimensions F as all nouns in R 
• Vectorize S over F through TFIDF. 
• Apply fuzzy c‐means to derive the 
subclusters of c. 
subCluster 12: 
battery life 
subCluster 21: 
picture quality 
subCluster 22: 
picture size 
8
Definition: Polarized feature 
The polarized feature represented by a cluster c defined in a feature space FR consists 
of: 
• The centroid vector <w1, …., w|FR|>, wi is the avg TF‐IDF of word i in c. 
• The polarity label cpolarity, is the majority class label of reviews in c. 
Topic‐specific classifiers 
o Within each (sub)cluster, a topic‐specific classifier is trained based on the 
corresponding instances from the initial (labeled) seed set S. 
o We focus on adverbs, adjectives as sentimental words. 
o Multinomial Naïve Bayes (MNB) sentiment learner 
• Laplace correction factor for unknown words 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
9 
Laplace correction for 
unknown words
Here is how the hierarchy looks like (ignore the containers for the moment) 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
10
• Opinionated streams 
• Opinion stream mining & the OPINSTREAM framework 
• Extracting an initial opinionated hierarchy of product (sub)features 
• Online (sub)feature hierarchy maintenance 
• Online opinion classifier maintenance 
• Experiments 
• Summary 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 11
o So far, we discussed the initialization of the polarized hierarchy. 
o How do we place a new coming review r (unlabeled) in the hierarchy? 
No Yes 
• Assign it to its closest 
(sub)cluster w.r.t. δg, δl 
• Update centroid 
• Learn its polarity via its topic 
specific classifier. 
Is r novel? 
• Difficult to decide whether its an outlier or the 
beginning of a new feature. 
• Wait and accumulate evidence till a decision can be 
made 
 Novel reviews are maintained in containers 
Definition: Review novelty 
A review r is novel w.r.t. a set of clusters Θ if ist cosine similarity to the closest cluster 
centroid in Θ is less than δ (0 ≤ δ ≤1). 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
12
Global container: 
accumulates reviews that are 
novel w.r.t. global clusters 
Local container (1 per global cluster): 
accumulates reviews that are close to 
it cluster centroid but far away from 
all centroids of its subclusters 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 13
o Adapt the hierarchy when enough novelty has been accumulated 
• Questions 1: How to define adequate novelty? 
• Questions 2: How to incorporate novel reviews in the hierarchy? 
o Always reclustering approach: if the size of a container exceeds a threshold, 
reorganize from scratch its corresponding local/global clusters (reclustering) 
• Results in a lot of local/global reclusterings 
• Lot of effort for the end user to re‐comprehend the hierarchy 
Such drastic changes should 
be avoided, unless necessary. 
o Merge or reclustering approach: instead of a drastic change like reclustering, try 
first to merge containers to their corresponding clusters. 
• Rationale: Clusters and containers might “start moving towards each other” due to 
ageing. 
• Merge option 1: Merge b and c w.r.t. the feature space of c 
• Merge option 2: Merge b and c w.r.t. a new feature space derived from b U C 
? 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 14
o Check model quality before and after the merge 
• Cluster description length as quality indicator 
o The cluster description length (CDL) of a cluster c defined over dimensions Dc: 
o Merge strategy 1: Do we gain in quality if we merge c with b, while retaining the 
feature space of c? 
o Merge strategy 2: Do we gain in quality if we merge c with b, using a new set of 
dimensions derived from b and c? 
This results actually in a new cluster. 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 15
o We should keep the number of rebuilds low to minimize the effort of the end user 
• The mental effort is modeled by fatigue. 
Definition: Fatigue 
Let M(t) be the hierarchy at t and n the number of reviews contained in its clusters. 
Fatigue is defined as the percentage of reviews involved in rebuilt clusters: 
o At the end of each batch and for each global cluster 
• Try Merge Strategy I 
• If not satisfied, try Merge Strategy II 
Sets of clusters involved in rebuilds 
o Compute fatigue (based on all global clusters for which Merge Strategy II is true) 
 If fatigue < γ, go on with local reclusterings 
 otherwise, rebuild the whole hierarchy from scratch. 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
16
• Opinionated streams 
• Opinion stream mining & the OPINSTREAM framework 
• Extracting an initial opinionated hierarchy of product (sub)features 
• Online (sub)feature hierarchy maintenance 
• Online opinion classifier maintenance 
• Experiments 
• Summary 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 17
o Static Multinomial Naïve Bayes 
o Probability of class c for word wi, estimation based on seed set S: 
Laplace correction for 
unknown words 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
18
o Backward adaptation version 
• Old reviews have gradually less effect on the classifier 
o Effect of age in the word counts 
o Effect of age in the probability estimates 
Correction factor 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
19
o Forward adaptation version 
• Labels are expensive, can we extend the initial seed set S? 
• Yes, by adding useful reviews to S 
o Definition: Usefulness of a review 
Let d be a new review, to which Δ(S) assigns the label c. The usefulness of d is: 
o A document is useful if its usefulness exceeds a threshold α in (‐1,0] 
• ~0 values promote smooth adaptation (d should “agree” with existing model) 
• ~‐1 values promote diversity 
VS ={perfect, friendly} 
d ={expensive, noisy} 
VS ={perfect, friendly, expensive, noisy} 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 20
• Opinionated streams 
• The OPINSTREAM framework 
• Extracting an initial opinionated hierarchy of product (sub)features 
• Online hierarchy maintenance 
• Online opinion classifier maintenance 
• Experiments 
• Summary 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 21
o Yu et al dataset: 12,825 reviews on 327 product features 
# features varies strongly from one 
batch to the next making adaptation 
challenging 
High entropy values indicating that 
there is no clear sentiment per batch 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
22
o Kg=9, δg=0.1, σg=500; Kl=9, δl=0.2, σl=150; 
o BaselineClu: is the container size‐based approach 
 Smooth adaptation leads to clusters of better quality 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
23
o Kg=6, δg=0.2, σg=500; Kl=6, δl=0.3, σl=150; 
o BaselineClas: non‐adaptive semi‐supervised approach 
o Silva: the classification rule learner by Silva et al. 
 Performance oscillates for all methods, OPINSTREAM and BaselineCLAS perform 
similarly and better than Silva. 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 
24
o We presented OPINSTREAM, a framework for the extraction of implicit features 
and their associated polarities from a stream of reviews. 
o Feature extraction  unsupervised task (stream clustering) 
• Accumulate novelty in containers 
• Cluster decscription length as an indicator of clusters quality 
• Fatigue for quantifying the mental effor caused by rebuilds 
o Sentiment learning ‐> supervised/semi‐supervised task (stream classification) 
• Learn with only an initial seed of labeled documents 
• Backward adaptation to count for ageing 
• Forward adaptation to expand the seed set by incorporating useful documents 
o Ongoing/future work 
• Simplifying the framework, parameter tuning, new datasets/ domains 
• Semi‐supervised sentiment learning 
• Evaluation of backward propagation in full supervised learning 
• Evaluation of polarity and aspect learning simultaneously 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 25
Questions? 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 26
Original class distribution 
Twitter sentiment dataset, 1.6M tweets 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 27
o Product reviews dataset: 540 reviews on 9 products, where each review refers to 1 
out of 38 (implicit) product feature. 
Label & Polarity evolution per cluster: polarity is captured as a shade of gray, where dark stands for positive 
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 28
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 29

More Related Content

Viewers also liked

The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck
 The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck
The Culture Vulture --- Watching Intently So Workplaces Don't Start to SuckEverything DiSC: A Wiley Brand
 
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...Patricia Santos
 
Udvikling af salget
Udvikling af salgetUdvikling af salget
Udvikling af salgetOBuch
 
Портал НООС
Портал НООСПортал НООС
Портал НООСsdoword
 
4Q 2008 EARNINGS SLIDESFINAL
4Q 2008 EARNINGS SLIDESFINAL4Q 2008 EARNINGS SLIDESFINAL
4Q 2008 EARNINGS SLIDESFINALfinance22
 
EL AMOOOOOOOOOOOOOOOOOOOOOR
EL AMOOOOOOOOOOOOOOOOOOOOOREL AMOOOOOOOOOOOOOOOOOOOOOR
EL AMOOOOOOOOOOOOOOOOOOOOORanalili
 
ppg industries2004AnnualReport
ppg industries2004AnnualReportppg industries2004AnnualReport
ppg industries2004AnnualReportfinance22
 
ppg industries2005ANNUALREPORT
ppg industries2005ANNUALREPORTppg industries2005ANNUALREPORT
ppg industries2005ANNUALREPORTfinance22
 
Trizivir P&T
Trizivir P&TTrizivir P&T
Trizivir P&TJaymax13
 

Viewers also liked (16)

The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck
 The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck
The Culture Vulture --- Watching Intently So Workplaces Don't Start to Suck
 
NeeMo@OpenCoffeeXX
NeeMo@OpenCoffeeXXNeeMo@OpenCoffeeXX
NeeMo@OpenCoffeeXX
 
Misa Crismal
Misa CrismalMisa Crismal
Misa Crismal
 
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...
 
Presentation1
Presentation1Presentation1
Presentation1
 
Udvikling af salget
Udvikling af salgetUdvikling af salget
Udvikling af salget
 
ECP Company Profile
ECP Company ProfileECP Company Profile
ECP Company Profile
 
Портал НООС
Портал НООСПортал НООС
Портал НООС
 
4Q 2008 EARNINGS SLIDESFINAL
4Q 2008 EARNINGS SLIDESFINAL4Q 2008 EARNINGS SLIDESFINAL
4Q 2008 EARNINGS SLIDESFINAL
 
EL AMOOOOOOOOOOOOOOOOOOOOOR
EL AMOOOOOOOOOOOOOOOOOOOOOREL AMOOOOOOOOOOOOOOOOOOOOOR
EL AMOOOOOOOOOOOOOOOOOOOOOR
 
ppg industries2004AnnualReport
ppg industries2004AnnualReportppg industries2004AnnualReport
ppg industries2004AnnualReport
 
Adviento 2009
Adviento 2009Adviento 2009
Adviento 2009
 
Legacy Writing
Legacy WritingLegacy Writing
Legacy Writing
 
Felins Africa
Felins AfricaFelins Africa
Felins Africa
 
ppg industries2005ANNUALREPORT
ppg industries2005ANNUALREPORTppg industries2005ANNUALREPORT
ppg industries2005ANNUALREPORT
 
Trizivir P&T
Trizivir P&TTrizivir P&T
Trizivir P&T
 

Similar to Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM

Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part IIQuantUniversity
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 
Post mortemanalysis rex session
Post mortemanalysis rex sessionPost mortemanalysis rex session
Post mortemanalysis rex sessionLaDrilhz
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfAlphaIssaghaDiallo
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
Detecting Food and Activities in Lifelogging Images
Detecting Food and Activities in Lifelogging ImagesDetecting Food and Activities in Lifelogging Images
Detecting Food and Activities in Lifelogging ImagesRami Albatal
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Inside Lean Kanban (#lkuk14 keynote)
Inside Lean Kanban (#lkuk14 keynote)Inside Lean Kanban (#lkuk14 keynote)
Inside Lean Kanban (#lkuk14 keynote)Mike Burrows
 
How to gain a foothold in the world of classification
How to gain a foothold in the world of classificationHow to gain a foothold in the world of classification
How to gain a foothold in the world of classificationTorsten Schön
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems - Yousef Fadila
 
5 analytic hierarchy_process
5 analytic hierarchy_process5 analytic hierarchy_process
5 analytic hierarchy_processFEG
 
analytic hierarchy_process
analytic hierarchy_processanalytic hierarchy_process
analytic hierarchy_processFEG
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesAnuj Gupta
 

Similar to Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM (20)

Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
Post mortemanalysis rex session
Post mortemanalysis rex sessionPost mortemanalysis rex session
Post mortemanalysis rex session
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
Detecting Food and Activities in Lifelogging Images
Detecting Food and Activities in Lifelogging ImagesDetecting Food and Activities in Lifelogging Images
Detecting Food and Activities in Lifelogging Images
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Inside Lean Kanban (#lkuk14 keynote)
Inside Lean Kanban (#lkuk14 keynote)Inside Lean Kanban (#lkuk14 keynote)
Inside Lean Kanban (#lkuk14 keynote)
 
How to gain a foothold in the world of classification
How to gain a foothold in the world of classificationHow to gain a foothold in the world of classification
How to gain a foothold in the world of classification
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
TED 453 Session 8
TED 453 Session 8TED 453 Session 8
TED 453 Session 8
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
5 analytic hierarchy_process
5 analytic hierarchy_process5 analytic hierarchy_process
5 analytic hierarchy_process
 
analytic hierarchy_process
analytic hierarchy_processanalytic hierarchy_process
analytic hierarchy_process
 
Unit 2
Unit 2Unit 2
Unit 2
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakes
 

More from Eirini Ntoutsi

AAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsAAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsEirini Ntoutsi
 
Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Eirini Ntoutsi
 
Bias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachBias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachEirini Ntoutsi
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Eirini Ntoutsi
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,Eirini Ntoutsi
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...Eirini Ntoutsi
 
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...Eirini Ntoutsi
 
Density-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data StreamsDensity-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data StreamsEirini Ntoutsi
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataEirini Ntoutsi
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsEirini Ntoutsi
 
Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Eirini Ntoutsi
 

More from Eirini Ntoutsi (11)

AAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsAAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systems
 
Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...
 
Bias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachBias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approach
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
 
Density-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data StreamsDensity-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data Streams
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic Data
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic Environments
 
Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM

  • 1. Discovering and monitoring product features and the opinions on them with the OPINSTREAM framework Max Zimmermann1, Eirini Ntoutsi2, Myra Spiliopoulou1 Hellenic Data Management Symposioum July 2014 1 2 Institute for Informatics, Ludwig‐ Maximilians‐Universität (LMU) München, Germany
  • 2. o Opinions from TripAdvisor on Vienna Marriott Hotel 2/5/2014: Great hotel, very nice rooms, perfect location, very nice staffs except for a mid‐aged female receptionist who tried to charge me extra for wifi fees when checking out. It was waived at the desk when I checked‐in. And she started treating me with an attitude after she found out that I got a great deal through priceline.com. …. 26/1/2014: Spent a long weekend here. Rooms clean and functional without being spectacular and a nice pool etc. Staff in pool weren't Good and I found them actually quite rood. Executive lounge was ok and not busy but selection of wine and beer wasn't great. The reception has many shops and a bar at the end which kind of males it feel like a shopping centre. Overall great for business travel but not sure id come again for leisure. 7/5/2013: The Vienna Marriott has all you expect; no frills, but solid service and they get all the basic stuff done right. It's in a fine location, maybe 10 minute walk from the major city attractions while being in a quiet area. Breakfast buffet exceptional and good fitness center. Very helpful and happy staff. Lobby lounge just okay. Not a good wine selection and the Sinatra‐like singer adds nothing. Maybe just a little more expensive than it should be, too. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 2
  • 3. • Opinionated streams • Opinion stream mining & the OPINSTREAM framework • Extracting an initial opinionated hierarchy of product (sub)features • Online (sub)feature hierarchy maintenance • Online opinion classifier maintenance • Experiments • Summary Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 3
  • 4. o In a static setting, i.e., given is a collection of opinionated documents: • How to extract the different, ad hoc aspects discussed in the collection? • How to associate a dominant polarity to each aspect? o When the set becomes a stream • How to extract the ad hoc aspects over time? • How to update their sentiment over time? o The OPINSTREAM framework • Feature discovery  adaptive unsupervised learning (clustering)  Features are defined as cluster centroids  Features evolve over time • Polarity learning  adaptive semi‐supervised learning (classification)  Majority voting within the cluster for the polarity  Learning under concept drift  Learn with a small amount of labeled reviews Polarized aspect Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 4
  • 5. o Reviews arrive at distinct timepoints t0, t1, …., ti, …. in batches of fixed size. o We don’t receive class labels from the stream; the only sentiment‐related information we assume is an initial seed set S of labeled reviews. o Reviews are subject to ageing according to the exponential ageing function • Recent reviews are more important for learning • Gradual decrease of the weight of a review over time • Ageing affects both clustering (feature extraction) and classification (sentiment learning) o The importance of a review r with respect to a set of reviews R, is defined as the number of reviews in R that have r among their kNN, also considering age. • For the kNN, cosine similarity is employed o Important reviews Rβ: those exceeding the review importance threshold β. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 5
  • 6. • Opinionated streams • Opinion stream mining & the OPINSTREAM framework • Extracting an initial opinionated hierarchy of product (sub)features • Online (sub)feature hierarchy maintenance • Online opinion classifier maintenance • Experiments • Summary Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 6
  • 7. Initial seed set S (labeled) Extract clusters battery, weight, life, … picture, quality, size, … Global/ 1st level clusters Clustering process • Extract important reviews R from S. • Define dimensions F as all nouns in R • Vectorize S over F through TFIDF. • Apply fuzzy c‐means to derive the 1st level Label k Cluster 1: battery Cluster 2: picture Cluster k: … clusters. … Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 7
  • 8. Initial seed set S (labeled) Extract clusters battery, weight, life, … picture, quality, size, … Cluster 1: battery Cluster 2: picture … Extract subclusters weight life quality size Local/ 2nd level clusters Similar to 1st level clustering, the dataset though is the population of each cluster … … Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM subCluster 11: battery weight Clustering process • Extract important reviews R from cluster c. • Define dimensions F as all nouns in R • Vectorize S over F through TFIDF. • Apply fuzzy c‐means to derive the subclusters of c. subCluster 12: battery life subCluster 21: picture quality subCluster 22: picture size 8
  • 9. Definition: Polarized feature The polarized feature represented by a cluster c defined in a feature space FR consists of: • The centroid vector <w1, …., w|FR|>, wi is the avg TF‐IDF of word i in c. • The polarity label cpolarity, is the majority class label of reviews in c. Topic‐specific classifiers o Within each (sub)cluster, a topic‐specific classifier is trained based on the corresponding instances from the initial (labeled) seed set S. o We focus on adverbs, adjectives as sentimental words. o Multinomial Naïve Bayes (MNB) sentiment learner • Laplace correction factor for unknown words Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 9 Laplace correction for unknown words
  • 10. Here is how the hierarchy looks like (ignore the containers for the moment) Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 10
  • 11. • Opinionated streams • Opinion stream mining & the OPINSTREAM framework • Extracting an initial opinionated hierarchy of product (sub)features • Online (sub)feature hierarchy maintenance • Online opinion classifier maintenance • Experiments • Summary Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 11
  • 12. o So far, we discussed the initialization of the polarized hierarchy. o How do we place a new coming review r (unlabeled) in the hierarchy? No Yes • Assign it to its closest (sub)cluster w.r.t. δg, δl • Update centroid • Learn its polarity via its topic specific classifier. Is r novel? • Difficult to decide whether its an outlier or the beginning of a new feature. • Wait and accumulate evidence till a decision can be made  Novel reviews are maintained in containers Definition: Review novelty A review r is novel w.r.t. a set of clusters Θ if ist cosine similarity to the closest cluster centroid in Θ is less than δ (0 ≤ δ ≤1). Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 12
  • 13. Global container: accumulates reviews that are novel w.r.t. global clusters Local container (1 per global cluster): accumulates reviews that are close to it cluster centroid but far away from all centroids of its subclusters Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 13
  • 14. o Adapt the hierarchy when enough novelty has been accumulated • Questions 1: How to define adequate novelty? • Questions 2: How to incorporate novel reviews in the hierarchy? o Always reclustering approach: if the size of a container exceeds a threshold, reorganize from scratch its corresponding local/global clusters (reclustering) • Results in a lot of local/global reclusterings • Lot of effort for the end user to re‐comprehend the hierarchy Such drastic changes should be avoided, unless necessary. o Merge or reclustering approach: instead of a drastic change like reclustering, try first to merge containers to their corresponding clusters. • Rationale: Clusters and containers might “start moving towards each other” due to ageing. • Merge option 1: Merge b and c w.r.t. the feature space of c • Merge option 2: Merge b and c w.r.t. a new feature space derived from b U C ? Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 14
  • 15. o Check model quality before and after the merge • Cluster description length as quality indicator o The cluster description length (CDL) of a cluster c defined over dimensions Dc: o Merge strategy 1: Do we gain in quality if we merge c with b, while retaining the feature space of c? o Merge strategy 2: Do we gain in quality if we merge c with b, using a new set of dimensions derived from b and c? This results actually in a new cluster. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 15
  • 16. o We should keep the number of rebuilds low to minimize the effort of the end user • The mental effort is modeled by fatigue. Definition: Fatigue Let M(t) be the hierarchy at t and n the number of reviews contained in its clusters. Fatigue is defined as the percentage of reviews involved in rebuilt clusters: o At the end of each batch and for each global cluster • Try Merge Strategy I • If not satisfied, try Merge Strategy II Sets of clusters involved in rebuilds o Compute fatigue (based on all global clusters for which Merge Strategy II is true)  If fatigue < γ, go on with local reclusterings  otherwise, rebuild the whole hierarchy from scratch. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 16
  • 17. • Opinionated streams • Opinion stream mining & the OPINSTREAM framework • Extracting an initial opinionated hierarchy of product (sub)features • Online (sub)feature hierarchy maintenance • Online opinion classifier maintenance • Experiments • Summary Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 17
  • 18. o Static Multinomial Naïve Bayes o Probability of class c for word wi, estimation based on seed set S: Laplace correction for unknown words Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 18
  • 19. o Backward adaptation version • Old reviews have gradually less effect on the classifier o Effect of age in the word counts o Effect of age in the probability estimates Correction factor Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 19
  • 20. o Forward adaptation version • Labels are expensive, can we extend the initial seed set S? • Yes, by adding useful reviews to S o Definition: Usefulness of a review Let d be a new review, to which Δ(S) assigns the label c. The usefulness of d is: o A document is useful if its usefulness exceeds a threshold α in (‐1,0] • ~0 values promote smooth adaptation (d should “agree” with existing model) • ~‐1 values promote diversity VS ={perfect, friendly} d ={expensive, noisy} VS ={perfect, friendly, expensive, noisy} Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 20
  • 21. • Opinionated streams • The OPINSTREAM framework • Extracting an initial opinionated hierarchy of product (sub)features • Online hierarchy maintenance • Online opinion classifier maintenance • Experiments • Summary Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 21
  • 22. o Yu et al dataset: 12,825 reviews on 327 product features # features varies strongly from one batch to the next making adaptation challenging High entropy values indicating that there is no clear sentiment per batch Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 22
  • 23. o Kg=9, δg=0.1, σg=500; Kl=9, δl=0.2, σl=150; o BaselineClu: is the container size‐based approach  Smooth adaptation leads to clusters of better quality Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 23
  • 24. o Kg=6, δg=0.2, σg=500; Kl=6, δl=0.3, σl=150; o BaselineClas: non‐adaptive semi‐supervised approach o Silva: the classification rule learner by Silva et al.  Performance oscillates for all methods, OPINSTREAM and BaselineCLAS perform similarly and better than Silva. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 24
  • 25. o We presented OPINSTREAM, a framework for the extraction of implicit features and their associated polarities from a stream of reviews. o Feature extraction  unsupervised task (stream clustering) • Accumulate novelty in containers • Cluster decscription length as an indicator of clusters quality • Fatigue for quantifying the mental effor caused by rebuilds o Sentiment learning ‐> supervised/semi‐supervised task (stream classification) • Learn with only an initial seed of labeled documents • Backward adaptation to count for ageing • Forward adaptation to expand the seed set by incorporating useful documents o Ongoing/future work • Simplifying the framework, parameter tuning, new datasets/ domains • Semi‐supervised sentiment learning • Evaluation of backward propagation in full supervised learning • Evaluation of polarity and aspect learning simultaneously Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 25
  • 26. Questions? Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 26
  • 27. Original class distribution Twitter sentiment dataset, 1.6M tweets Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 27
  • 28. o Product reviews dataset: 540 reviews on 9 products, where each review refers to 1 out of 38 (implicit) product feature. Label & Polarity evolution per cluster: polarity is captured as a shade of gray, where dark stands for positive Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 28
  • 29. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 29