SlideShare a Scribd company logo
1 of 29
Sentiment analysis – extracting
decision-relevant knowledge from UGC
Sergej Schmunka
Wolfram Höpkena
Matthias Fuchsb
Maria Lexhagenb
a University

of Applied Sciences Ravensburg-Weingarten
Weingarten, Germany
{name.surname}@hs-weingarten.de
b Mid-Sweden

University
Östersund, Sweden
{name.surname}@miun.se

ENTER 2014 Research Track

Slide Number 1
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 2
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 3
Motivation
• User generated content (UGC)
– Huge potential to reduce information asymmetries
• >65% of users use review sites for travel decision
• >95% of users consider review sites as credible

– Valuable knowledge base for tourism suppliers to enhance
service quality

• Challenge for tourism managers
– Find relevant reviews and analyse them efficiently
– Automatic extraction of decision-relevant knowledge
– Customer feedback on the level of product properties
ENTER 2014 Research Track

Slide Number 4
Objective
• Automatic information extraction from textual
customer reviews of online review platforms
– Identifying the polarity of customer opinions
– Assigning opinions to product properties

• Evaluation
– Compare different data mining techniques (dictionarybased and machine learning approaches) concerning the
quality of extracted information
– Evaluate decision support in context of a destination MIS

ENTER 2014 Research Track

Slide Number 5
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 6
Sentiment analysis
• Sentiment analysis / opinion mining
– Identification of subjective statements and contained
opinions and sentiments within natural texts

• Approaches
– Machine learning, dictionary-based, statistical and
semantic approaches
ENTER 2014 Research Track

Slide Number 7
Sentiment analysis
• Related work
– Ye et al. (2009) apply supervised learning algorithms
(Support Vector Machines, Naïve Bayes and n-gram based
language models) to complete customer reviews
– Kasper and Vela (2011) make use of machine learning and
a semantic approach, based on rules to detect linguistic
parts of a sentence
– Grabner et al. (2012) extract a domain-specific lexicon of
semantically relevant words together with their POS tags
– García et al. (2012) present a dictionary-based approach,
using a dictionary with 6,000 positive and negative words
ENTER 2014 Research Track

Slide Number 8
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 9
Process of sentiment analysis

ENTER 2014 Research Track

Slide Number 10
Document selection
• Collect revelant pages by a
web crawler
• Fetch html pages and follow
contained links based on
regular expressions
(manually defined)

ENTER 2014 Research Track

Slide Number 11
Document processing
• Extraction of opinion texts
from HTML code
• Remove htmltags, headers/footers, etc.
by regular expressions and
Xpath

• Removal of empty reviews
• Filtering of English texts
• Based on text classification

• Generation of single
sentences/statements

(for hotels in Are, Sweden)

ENTER 2014 Research Track

Slide Number 12
Mining
• Machine learning methods
• Manually labeling training
data
• Preprocessing
•
•
•
•
•
•

Tokenizing
Stop word removal
Stemming
TF-IDF word vector creation
POS tagging (part-of-speech)
N-gram creation

• Classification into
property, subjectivity and
sentiment
• Support vector machines
(SVM)
• Naïve Bayes
• K-nearest neighbour (k-NN)
ENTER 2014 Research Track

Slide Number 13
Mining

• Dictionary-based method
• Manual creation of word list
(dictionary) for each class
(i.e. property, subjectivity
and sentiment)
• Word list with 6,800
positive and negative
words

• Classification based on
majority of contained words

ENTER 2014 Research Track

Slide Number 14
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 15
Evaluation of classification methods
Method

Accuracy

Property recognition
SVM (with POS tagging)
Naïve Bayes (with POS tagging)
k-NN (with k = 8)
Dictionary-based
Subjectivity recognition
SVM
Naïve Bayes
k-NN (with k = 5)
Dictionary-based
Sentiment recognition
SVM (with bigrams)
Naïve Bayes (with trigrams)
k-NN (with k = 8)
Dictionary-based
1

72.36%1
49.72%1
57.08%1
71.28%2
65.50%1
60.67%1
55.50%1
82.63%2
76.80%1
69.80%1
69.60%1
71.28%2

Machine learning models evaluated by a 10-fold cross-validation
method evaluated by comparing results with pre-classified test data

2 Dictionary-based

ENTER 2014 Research Track

Slide Number 16
Evaluation of classification methods
Method

Accuracy

Property recognition
SVM (with POS tagging)
Naïve Bayes (with POS tagging)
k-NN (with k = 8)
Dictionary-based
Subjectivity recognition
SVM
Naïve Bayes
k-NN (with k = 5)
Dictionary-based
Sentiment recognition
SVM (with bigrams)
Naïve Bayes (with trigrams)
k-NN (with k = 8)
Dictionary-based
1

72.36%1
49.72%1
57.08%1
71.28%2

• SVM best machine
learning technique for
property recognition
• Although based on limited
training data set size (100)

65.50%1
60.67%1
55.50%1
82.63%2
76.80%1
69.80%1
69.60%1
71.28%2

Machine learning models evaluated by a 10-fold cross-validation
method evaluated by comparing results with pre-classified test data

2 Dictionary-based

ENTER 2014 Research Track

Slide Number 17
Evaluation of classification methods
Method

Accuracy

Property recognition
SVM (with POS tagging)
Naïve Bayes (with POS tagging)
k-NN (with k = 8)
Dictionary-based
Subjectivity recognition
SVM
Naïve Bayes
k-NN (with k = 5)
Dictionary-based
Sentiment recognition
SVM (with bigrams)
Naïve Bayes (with trigrams)
k-NN (with k = 8)
Dictionary-based
1

72.36%1
49.72%1
57.08%1
71.28%2
65.50%1
60.67%1
55.50%1
82.63%2
76.80%1
69.80%1
69.60%1
71.28%2

• SVM best machine
learning technique for
property recognition
• Although based on limited
training data set size (100)

• Dictionary-based method
achieved competitive
results
• Most misclassifications are
caused by class
“Uncategorized” as only
most prominent words
have been included in
word lists

Machine learning models evaluated by a 10-fold cross-validation
method evaluated by comparing results with pre-classified test data

2 Dictionary-based

ENTER 2014 Research Track

Slide Number 18
Evaluation of classification methods
Method

Accuracy

Property recognition
SVM (with POS tagging)
Naïve Bayes (with POS tagging)
k-NN (with k = 8)
Dictionary-based
Subjectivity recognition
SVM
Naïve Bayes
k-NN (with k = 5)
Dictionary-based
Sentiment recognition
SVM (with bigrams)
Naïve Bayes (with trigrams)
k-NN (with k = 8)
Dictionary-based
1

72.36%1
49.72%1
57.08%1
71.28%2
65.50%1
60.67%1
55.50%1
82.63%2
76.80%1
69.80%1
69.60%1
71.28%2

• Dictionary-based
approach achieved best
results
• Possibly caused by huge
word list (6,800 words)
compared to fairly small
training data set size (300
per class) of machine
learning methods

Machine learning models evaluated by a 10-fold cross-validation
method evaluated by comparing results with pre-classified test data

2 Dictionary-based

ENTER 2014 Research Track

Slide Number 19
Examples of subjectivity recognition
Statement

Recognized Class Real Class

Hmmm must be a hospital because of that
sweet smell of mould and or dead old lady

Subjective

Subjective

Would not recommend unless you have
children
Skiing and staying in Sweden is so different
to other European resorts

Subjective

Subjective

Factual

Factual

The restaurant is high standard very original Factual
and lots of local products

Subjective

This can be a cost saver for families with
children

Mixture of
different
opinions

Factual

Subjective

Ambiguous
statement
ENTER 2014 Research Track

Slide Number 20
Evaluation of classification methods
Method

Accuracy

Property recognition
SVM (with POS tagging)
Naïve Bayes (with POS tagging)
k-NN (with k = 8)
Dictionary-based
Subjectivity recognition
SVM
Naïve Bayes
k-NN (with k = 5)
Dictionary-based
Sentiment recognition
SVM (with bigrams)
Naïve Bayes (with trigrams)
k-NN (with k = 8)
Dictionary-based
1

72.36%1
49.72%1
57.08%1
71.28%2
65.50%1
60.67%1
55.50%1
82.63%2
76.80%1
69.80%1
69.60%1
71.28%2

• SVM method reached
best result
• Dictionary-based approach
suffers from additional
class „neutral“ if positive
and negative words are
equally frequent

Machine learning models evaluated by a 10-fold cross-validation
method evaluated by comparing results with pre-classified test data

2 Dictionary-based

ENTER 2014 Research Track

Slide Number 21
Examples of sentiment recognition
Statement

Recognized Class Real Class

Parts of the hotel seems to be an old
hospital

Negative

Negative

All other guests I would recommend hotel
Positive
diplomat instead
The rooms aren’t too big but very clean and Negative
comfy

Negative

Good rooms and nicely clean

Positive

Positive

Very nice breakfast room good selection for
breakfast

Positive

Misleading
statement

Positive

ENTER 2014 Research Track

Positive

Mixture of
different
opinions

Slide Number 22
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 23
Core feedback data
Core information extracted from review sites

ENTER 2014 Research Track

Slide Number 24
Benchmarking
Average sentiment per accommodation provider

ENTER 2014 Research Track

Slide Number 25
Benchmarking
Average sentiment per product property and
accommodation provider

ENTER 2014 Research Track

Slide Number 26
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 27
Conclusion
• Automatically extracting and analyzing customer
reviews from tourism review sites
– SVM best machine learning method
– POS tagging and N-grams can
significantly improve results
– Dictionary-based approaches
achieve competitive (property) or
even superior results (subjectivity)

• Extracted knowledge constitutes valuable input to
decision support
ENTER 2014 Research Track

Slide Number 28
Content
• Introduction
• Sentiment analysis
• Methodology and implementation
• Evaluation

• Extracted knowledge as input to decision support
• Conclusion

ENTER 2014 Research Track

Slide Number 29

More Related Content

Viewers also liked

Tourism Innovation and Tourism Cluster Programme, Finland
Tourism Innovation and Tourism Cluster Programme, FinlandTourism Innovation and Tourism Cluster Programme, Finland
Tourism Innovation and Tourism Cluster Programme, Finland
Matkailufoorumi
 

Viewers also liked (19)

Tourism destination perspective. Best practices of Zermatt - Matterhorn.
Tourism destination perspective. Best practices of Zermatt - Matterhorn.Tourism destination perspective. Best practices of Zermatt - Matterhorn.
Tourism destination perspective. Best practices of Zermatt - Matterhorn.
 
Open Strategy: The use of Open Innovation in Co-creating Vienna's Tourism Str...
Open Strategy: The use of Open Innovation in Co-creating Vienna's Tourism Str...Open Strategy: The use of Open Innovation in Co-creating Vienna's Tourism Str...
Open Strategy: The use of Open Innovation in Co-creating Vienna's Tourism Str...
 
Value co-creation and co-destruction in connected tourist experiences
Value co-creation and co-destruction in connected tourist experiencesValue co-creation and co-destruction in connected tourist experiences
Value co-creation and co-destruction in connected tourist experiences
 
DataTourism: designing an architecture to process tourism data
DataTourism: designing an architecture to process tourism dataDataTourism: designing an architecture to process tourism data
DataTourism: designing an architecture to process tourism data
 
Smart Tourism Destinations
Smart Tourism DestinationsSmart Tourism Destinations
Smart Tourism Destinations
 
Marketing the smart destination
Marketing the smart destinationMarketing the smart destination
Marketing the smart destination
 
Tourism, Innovation and Technology: Building the Future
Tourism, Innovation and Technology: Building the FutureTourism, Innovation and Technology: Building the Future
Tourism, Innovation and Technology: Building the Future
 
Smart Tourism Ecosystems 2
Smart Tourism Ecosystems 2Smart Tourism Ecosystems 2
Smart Tourism Ecosystems 2
 
Digimarketing for Tourism. Presented at University of the Sunshine Coast
Digimarketing for Tourism. Presented at University of the Sunshine CoastDigimarketing for Tourism. Presented at University of the Sunshine Coast
Digimarketing for Tourism. Presented at University of the Sunshine Coast
 
Re-visiting Tourism Information Search Process: From Smartphone Users’ Perspe...
Re-visiting Tourism Information Search Process: From Smartphone Users’ Perspe...Re-visiting Tourism Information Search Process: From Smartphone Users’ Perspe...
Re-visiting Tourism Information Search Process: From Smartphone Users’ Perspe...
 
Smart and Connected Tourism Technologies
Smart and Connected Tourism TechnologiesSmart and Connected Tourism Technologies
Smart and Connected Tourism Technologies
 
Tourism Innovation and Tourism Cluster Programme, Finland
Tourism Innovation and Tourism Cluster Programme, FinlandTourism Innovation and Tourism Cluster Programme, Finland
Tourism Innovation and Tourism Cluster Programme, Finland
 
Smart Tourism Destinations: Smartness as Competitive Advantage
Smart Tourism Destinations: Smartness as Competitive AdvantageSmart Tourism Destinations: Smartness as Competitive Advantage
Smart Tourism Destinations: Smartness as Competitive Advantage
 
Conceptualising Smart Tourism Destination Dimensions
Conceptualising Smart Tourism Destination DimensionsConceptualising Smart Tourism Destination Dimensions
Conceptualising Smart Tourism Destination Dimensions
 
Student Preferences for Social Media Source Characteristics
Student Preferences for Social Media Source CharacteristicsStudent Preferences for Social Media Source Characteristics
Student Preferences for Social Media Source Characteristics
 
The role of information quality, visual appeal and information facilitation i...
The role of information quality, visual appeal and information facilitation i...The role of information quality, visual appeal and information facilitation i...
The role of information quality, visual appeal and information facilitation i...
 
The impact of sharing economy on the diversification of tourism products imp...
The impact of sharing economy on the diversification of tourism products  imp...The impact of sharing economy on the diversification of tourism products  imp...
The impact of sharing economy on the diversification of tourism products imp...
 
Examining the role of social media within the destination marketing framework
Examining the role of social media within the destination marketing frameworkExamining the role of social media within the destination marketing framework
Examining the role of social media within the destination marketing framework
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar to Sentiment Analysis – Extracting Decision-Relevant Knowledge from UGC

Analysis, design and implementation of a Multi-Criteria Recommender System ba...
Analysis, design and implementation of a Multi-Criteria Recommender System ba...Analysis, design and implementation of a Multi-Criteria Recommender System ba...
Analysis, design and implementation of a Multi-Criteria Recommender System ba...
Davide Giannico
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Paolo Missier
 
Differentiating Quantitative and Qualitative Research Design
Differentiating Quantitative and Qualitative Research DesignDifferentiating Quantitative and Qualitative Research Design
Differentiating Quantitative and Qualitative Research Design
Dino Andrey
 

Similar to Sentiment Analysis – Extracting Decision-Relevant Knowledge from UGC (20)

ppt research method 1.ppt
ppt research method 1.pptppt research method 1.ppt
ppt research method 1.ppt
 
Invited talk @Roma La Sapienza, April '07
Invited talk @Roma La Sapienza, April '07Invited talk @Roma La Sapienza, April '07
Invited talk @Roma La Sapienza, April '07
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Artificial Intelligence Certification
Artificial Intelligence CertificationArtificial Intelligence Certification
Artificial Intelligence Certification
 
Ai
AiAi
Ai
 
Analysis, design and implementation of a Multi-Criteria Recommender System ba...
Analysis, design and implementation of a Multi-Criteria Recommender System ba...Analysis, design and implementation of a Multi-Criteria Recommender System ba...
Analysis, design and implementation of a Multi-Criteria Recommender System ba...
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
 
DSD-INT 2021 The choice - A workshop for modelers
DSD-INT 2021 The choice - A workshop for modelersDSD-INT 2021 The choice - A workshop for modelers
DSD-INT 2021 The choice - A workshop for modelers
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interaction
 
Differentiating Quantitative and Qualitative Research Design
Differentiating Quantitative and Qualitative Research DesignDifferentiating Quantitative and Qualitative Research Design
Differentiating Quantitative and Qualitative Research Design
 
Aspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete pptAspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete ppt
 
Workshop on Quantitative Analytics Using Interactive On-line Tool
Workshop on Quantitative Analytics Using Interactive On-line ToolWorkshop on Quantitative Analytics Using Interactive On-line Tool
Workshop on Quantitative Analytics Using Interactive On-line Tool
 
1WR RapiTests for Sensory
1WR RapiTests for Sensory1WR RapiTests for Sensory
1WR RapiTests for Sensory
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
Caa2013 9 10-july2013
Caa2013 9 10-july2013Caa2013 9 10-july2013
Caa2013 9 10-july2013
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Sentiment Analysis – Extracting Decision-Relevant Knowledge from UGC

  • 1. Sentiment analysis – extracting decision-relevant knowledge from UGC Sergej Schmunka Wolfram Höpkena Matthias Fuchsb Maria Lexhagenb a University of Applied Sciences Ravensburg-Weingarten Weingarten, Germany {name.surname}@hs-weingarten.de b Mid-Sweden University Östersund, Sweden {name.surname}@miun.se ENTER 2014 Research Track Slide Number 1
  • 2. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 2
  • 3. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 3
  • 4. Motivation • User generated content (UGC) – Huge potential to reduce information asymmetries • >65% of users use review sites for travel decision • >95% of users consider review sites as credible – Valuable knowledge base for tourism suppliers to enhance service quality • Challenge for tourism managers – Find relevant reviews and analyse them efficiently – Automatic extraction of decision-relevant knowledge – Customer feedback on the level of product properties ENTER 2014 Research Track Slide Number 4
  • 5. Objective • Automatic information extraction from textual customer reviews of online review platforms – Identifying the polarity of customer opinions – Assigning opinions to product properties • Evaluation – Compare different data mining techniques (dictionarybased and machine learning approaches) concerning the quality of extracted information – Evaluate decision support in context of a destination MIS ENTER 2014 Research Track Slide Number 5
  • 6. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 6
  • 7. Sentiment analysis • Sentiment analysis / opinion mining – Identification of subjective statements and contained opinions and sentiments within natural texts • Approaches – Machine learning, dictionary-based, statistical and semantic approaches ENTER 2014 Research Track Slide Number 7
  • 8. Sentiment analysis • Related work – Ye et al. (2009) apply supervised learning algorithms (Support Vector Machines, Naïve Bayes and n-gram based language models) to complete customer reviews – Kasper and Vela (2011) make use of machine learning and a semantic approach, based on rules to detect linguistic parts of a sentence – Grabner et al. (2012) extract a domain-specific lexicon of semantically relevant words together with their POS tags – García et al. (2012) present a dictionary-based approach, using a dictionary with 6,000 positive and negative words ENTER 2014 Research Track Slide Number 8
  • 9. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 9
  • 10. Process of sentiment analysis ENTER 2014 Research Track Slide Number 10
  • 11. Document selection • Collect revelant pages by a web crawler • Fetch html pages and follow contained links based on regular expressions (manually defined) ENTER 2014 Research Track Slide Number 11
  • 12. Document processing • Extraction of opinion texts from HTML code • Remove htmltags, headers/footers, etc. by regular expressions and Xpath • Removal of empty reviews • Filtering of English texts • Based on text classification • Generation of single sentences/statements (for hotels in Are, Sweden) ENTER 2014 Research Track Slide Number 12
  • 13. Mining • Machine learning methods • Manually labeling training data • Preprocessing • • • • • • Tokenizing Stop word removal Stemming TF-IDF word vector creation POS tagging (part-of-speech) N-gram creation • Classification into property, subjectivity and sentiment • Support vector machines (SVM) • Naïve Bayes • K-nearest neighbour (k-NN) ENTER 2014 Research Track Slide Number 13
  • 14. Mining • Dictionary-based method • Manual creation of word list (dictionary) for each class (i.e. property, subjectivity and sentiment) • Word list with 6,800 positive and negative words • Classification based on majority of contained words ENTER 2014 Research Track Slide Number 14
  • 15. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 15
  • 16. Evaluation of classification methods Method Accuracy Property recognition SVM (with POS tagging) Naïve Bayes (with POS tagging) k-NN (with k = 8) Dictionary-based Subjectivity recognition SVM Naïve Bayes k-NN (with k = 5) Dictionary-based Sentiment recognition SVM (with bigrams) Naïve Bayes (with trigrams) k-NN (with k = 8) Dictionary-based 1 72.36%1 49.72%1 57.08%1 71.28%2 65.50%1 60.67%1 55.50%1 82.63%2 76.80%1 69.80%1 69.60%1 71.28%2 Machine learning models evaluated by a 10-fold cross-validation method evaluated by comparing results with pre-classified test data 2 Dictionary-based ENTER 2014 Research Track Slide Number 16
  • 17. Evaluation of classification methods Method Accuracy Property recognition SVM (with POS tagging) Naïve Bayes (with POS tagging) k-NN (with k = 8) Dictionary-based Subjectivity recognition SVM Naïve Bayes k-NN (with k = 5) Dictionary-based Sentiment recognition SVM (with bigrams) Naïve Bayes (with trigrams) k-NN (with k = 8) Dictionary-based 1 72.36%1 49.72%1 57.08%1 71.28%2 • SVM best machine learning technique for property recognition • Although based on limited training data set size (100) 65.50%1 60.67%1 55.50%1 82.63%2 76.80%1 69.80%1 69.60%1 71.28%2 Machine learning models evaluated by a 10-fold cross-validation method evaluated by comparing results with pre-classified test data 2 Dictionary-based ENTER 2014 Research Track Slide Number 17
  • 18. Evaluation of classification methods Method Accuracy Property recognition SVM (with POS tagging) Naïve Bayes (with POS tagging) k-NN (with k = 8) Dictionary-based Subjectivity recognition SVM Naïve Bayes k-NN (with k = 5) Dictionary-based Sentiment recognition SVM (with bigrams) Naïve Bayes (with trigrams) k-NN (with k = 8) Dictionary-based 1 72.36%1 49.72%1 57.08%1 71.28%2 65.50%1 60.67%1 55.50%1 82.63%2 76.80%1 69.80%1 69.60%1 71.28%2 • SVM best machine learning technique for property recognition • Although based on limited training data set size (100) • Dictionary-based method achieved competitive results • Most misclassifications are caused by class “Uncategorized” as only most prominent words have been included in word lists Machine learning models evaluated by a 10-fold cross-validation method evaluated by comparing results with pre-classified test data 2 Dictionary-based ENTER 2014 Research Track Slide Number 18
  • 19. Evaluation of classification methods Method Accuracy Property recognition SVM (with POS tagging) Naïve Bayes (with POS tagging) k-NN (with k = 8) Dictionary-based Subjectivity recognition SVM Naïve Bayes k-NN (with k = 5) Dictionary-based Sentiment recognition SVM (with bigrams) Naïve Bayes (with trigrams) k-NN (with k = 8) Dictionary-based 1 72.36%1 49.72%1 57.08%1 71.28%2 65.50%1 60.67%1 55.50%1 82.63%2 76.80%1 69.80%1 69.60%1 71.28%2 • Dictionary-based approach achieved best results • Possibly caused by huge word list (6,800 words) compared to fairly small training data set size (300 per class) of machine learning methods Machine learning models evaluated by a 10-fold cross-validation method evaluated by comparing results with pre-classified test data 2 Dictionary-based ENTER 2014 Research Track Slide Number 19
  • 20. Examples of subjectivity recognition Statement Recognized Class Real Class Hmmm must be a hospital because of that sweet smell of mould and or dead old lady Subjective Subjective Would not recommend unless you have children Skiing and staying in Sweden is so different to other European resorts Subjective Subjective Factual Factual The restaurant is high standard very original Factual and lots of local products Subjective This can be a cost saver for families with children Mixture of different opinions Factual Subjective Ambiguous statement ENTER 2014 Research Track Slide Number 20
  • 21. Evaluation of classification methods Method Accuracy Property recognition SVM (with POS tagging) Naïve Bayes (with POS tagging) k-NN (with k = 8) Dictionary-based Subjectivity recognition SVM Naïve Bayes k-NN (with k = 5) Dictionary-based Sentiment recognition SVM (with bigrams) Naïve Bayes (with trigrams) k-NN (with k = 8) Dictionary-based 1 72.36%1 49.72%1 57.08%1 71.28%2 65.50%1 60.67%1 55.50%1 82.63%2 76.80%1 69.80%1 69.60%1 71.28%2 • SVM method reached best result • Dictionary-based approach suffers from additional class „neutral“ if positive and negative words are equally frequent Machine learning models evaluated by a 10-fold cross-validation method evaluated by comparing results with pre-classified test data 2 Dictionary-based ENTER 2014 Research Track Slide Number 21
  • 22. Examples of sentiment recognition Statement Recognized Class Real Class Parts of the hotel seems to be an old hospital Negative Negative All other guests I would recommend hotel Positive diplomat instead The rooms aren’t too big but very clean and Negative comfy Negative Good rooms and nicely clean Positive Positive Very nice breakfast room good selection for breakfast Positive Misleading statement Positive ENTER 2014 Research Track Positive Mixture of different opinions Slide Number 22
  • 23. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 23
  • 24. Core feedback data Core information extracted from review sites ENTER 2014 Research Track Slide Number 24
  • 25. Benchmarking Average sentiment per accommodation provider ENTER 2014 Research Track Slide Number 25
  • 26. Benchmarking Average sentiment per product property and accommodation provider ENTER 2014 Research Track Slide Number 26
  • 27. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 27
  • 28. Conclusion • Automatically extracting and analyzing customer reviews from tourism review sites – SVM best machine learning method – POS tagging and N-grams can significantly improve results – Dictionary-based approaches achieve competitive (property) or even superior results (subjectivity) • Extracted knowledge constitutes valuable input to decision support ENTER 2014 Research Track Slide Number 28
  • 29. Content • Introduction • Sentiment analysis • Methodology and implementation • Evaluation • Extracted knowledge as input to decision support • Conclusion ENTER 2014 Research Track Slide Number 29