SlideShare a Scribd company logo
2014
Lexicon-Based
Sentiment Analysis
Using the Most-Mentioned
Word Tree
Bo-Hyun Kim, Sr. Software Engineer
HP Big Data Business Unit
Oct 10th, 2014
#GHC14
2014
2014
What to Expect
 Sentiment Analysis
− What is it?
− Why is it interesting?
− How HP Vertica Pulse works
− Achieving greater accuracy
− Different point of view using the most-
mentioned word tree
2014
What I Expect
 A 5-star rating on GHC app
 I just expect you to enjoy and learn!
2014
Sentiment Analysis
 In plain English
− the process of automatically detecting if a text
segment contains emotional or opinionated
content and determining its polarity (e.g., “thumbs
up” or “thumbs down”), is a field of research that has
received significant attention in recent years, both in
academia and in industry. [Wright, 2009]
2014
Gimme Examples!
 Also known as:
− Opinion Mining
− Text Mining
 Determine people’s general opinion
− “I just got a new car, and I’m loving it ”
− “My new car isn’t as fast as I thought.”
2014
Why are we interested?
 Increasing(every minute!) web usage
− Articles
− Blogs
− Comments
 Power of Social Media
− Online Shopping
− Customer Reviews
− Recommended products on Amazon
− How other people feel about the product
2014
Product Review
2014
Data… Data… Data…
2014
HP Vertica Pulse
2014
How to Analyze?
 Lexicon-based approach – HP Labs [Zhang et. al. 2011]
 Choose a product, person, event, organization, or topic
[Hu and Liu, 2004] to analyze the opinion
 Determine the Semantic Orientation score of opinion
lexicons
Word Semantic Orientation Value
Fabulous +3
Good +1
Bad -1
Nasty -3
2014
Sentiment Scoring
 Input: text or sentence
 Output: For each attribute or entity, generates a sentiment score
ranging from -1 to 1
− -1: Negative sentiment
− 0: Neutral sentiment
− 1: Positive sentiment
 Entity-level lexicon-based sentiment scoring
2014
Limitation
 Semantic Orientation value(‘missed’) = -1
 Gives more weight to the closely located
word
 Accuracy can suffer
2014
Improve accuracy
 Accuracy is what we strive for!
 More robust pre-processing
− Prune data to fit for different types of user
opinion (e.g. Twitter vs. YouTube comments)
 Naïve Bayes Classifier Training
 Tune accordingly
2014
Data Set
 Test dataset
− Stanford students collected
− In 2009
− Over 3 million tweets with tested score
− Analyzed 3500 tweets
 Collected dataset
− HP Vertica Pulse Twitter Connector
− In 2014
− Total of 1.2 million tweets over 30 days
2014
Data Pruning
 Remove
− Job postings
• #job, #jobs, #tweetmyjob
− Links
• http://this.is/nogood
− Duplicates
− Twitter specific characters
• RT, @, #
− Emoticons
• I hate my life :-), sarcasm is wide-spread disease
 After pruning
− ~287000 tweets, 24% of the 1.2 million tweets
2014
Naïve Bayes Classifier
 Supervised learning
− Probabilistic classifier based on Bayes’ theorem
− Requires a small amount of data
− Assumes the presence/absence of a particular
feature of a class is unrelated to the
presence/absence of any other feature
− Classifying the object based on its included features
𝑃(𝐶𝑗|𝐷) =
𝑃 𝐷 𝐶𝑗 𝑃(𝐶𝑗)
𝑃(𝐷)
− Open source found at [nltk.org]
2014
Naïve Bayes Classifier
 Results:
− Final accuracy : 0.788
2014
Tuning Pulse
 Positive words
 Negative words
 Neutral words
 White lists
 Stop words
 Synonym mappings
2014
Accuracy Comparison
 Sentiment scores generated for each
phase
Keyword Ideal Original Pruning Training Tuning
Healthcare -0.1515 -0.0333 -0.0833 -0.1 -0.125
Obama 0.308 0.0944 0.1535 0.1535 0.1842
2014
Trend/Targeted Analysis
 Targeted dataset analysis can help improve accuracy
 Identify the most-mentioned words
− Use the most-recurrent words to narrow the scope of analysis
 Find new trends
− Government healthcare (2009) vs. Obamacare (2014)
 Are we looking at the targeted data?
− “Solve healthcare challenges with technology!”
− “Healthcare After ObamaCare”
− “Get affordable healthcare at HealthCare.gov”
2014
Generating Tree
 Increase the relevancy of sentiment score by
running the sentiment analysis on the entity, as
well as on the most-recurrent words to identify:
− Homonyms that machines do not understand
− More accurate scores based on user interest
 Generate tree using Text Search
− Merge stemmer words
e.g. query, queries, querying…
− Lucene - apache open source
2014
Tree View
healthcare
obamacare !(Obamacare)
obama !(Obama) !(health)health
2014
Thank you 
Questions?
bohyun@hp.com
bohyun.j.kim@gmail.com
Many thanks to*:
Tim Donar, Solution Engineer
Beth Favini, Tech Pubs Sr. Manager
Judith Plummer, Tech Pubs Editor in Chief
* In alphabetical order
2014
Got Feedback?
Rate and Review the session using the
GHC Mobile App
To download visit www.gracehopper.org

More Related Content

What's hot

Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisRachna Raveendran
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment AnalysisNihar Suryawanshi
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment AnalysisMichel Bruley
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis Naveen Kumar
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysisAnkit Khera
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageMido Razaz
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningShital Kat
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysisnancy amala
 
Potentials and limitations of ‘Automated Sentiment Analysis
Potentials and limitations of ‘Automated Sentiment AnalysisPotentials and limitations of ‘Automated Sentiment Analysis
Potentials and limitations of ‘Automated Sentiment AnalysisKarthik Sharma
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 

What's hot (20)

Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment Analysis
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment Analysis
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysis
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysis
 
Potentials and limitations of ‘Automated Sentiment Analysis
Potentials and limitations of ‘Automated Sentiment AnalysisPotentials and limitations of ‘Automated Sentiment Analysis
Potentials and limitations of ‘Automated Sentiment Analysis
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 

Viewers also liked

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisAnkur Tyagi
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Tien-Yang (Aiden) Wu
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...Cataldo Musto
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkDavide Nardone
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 

Viewers also liked (6)

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 

Similar to Lexicon-Based Sentiment Analysis at GHC 2014

Support Optimization
Support OptimizationSupport Optimization
Support OptimizationLymba
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Top 5 Survey Data Analysis Software .pptx
Top 5 Survey Data Analysis Software .pptxTop 5 Survey Data Analysis Software .pptx
Top 5 Survey Data Analysis Software .pptxRepustate
 
Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review PoojaPrasannan4
 
A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity   A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity Career Communications Group
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016Kunal Dash
 
Sentiment Analysis for SEO
Sentiment Analysis for SEOSentiment Analysis for SEO
Sentiment Analysis for SEOBen Johnston
 
Interestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsInterestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsArpit Bhayani
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsTelling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsUserZoom
 
Interestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsInterestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsKritiKansalK
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadroznypadatascience
 
I Am Data-driven and So Are You
I Am Data-driven and So Are YouI Am Data-driven and So Are You
I Am Data-driven and So Are YouLever Inc.
 
Manisha_Microsoft_GlobalBootCamp_2019
Manisha_Microsoft_GlobalBootCamp_2019Manisha_Microsoft_GlobalBootCamp_2019
Manisha_Microsoft_GlobalBootCamp_2019Dr. Manisha Malhotra
 
Webinar - How to Choose and Use Salary Data
Webinar - How to Choose and Use Salary DataWebinar - How to Choose and Use Salary Data
Webinar - How to Choose and Use Salary DataPayScale, Inc.
 
General Tips to Fast-Track Your Quantitative Methodology
General Tips to Fast-Track Your Quantitative MethodologyGeneral Tips to Fast-Track Your Quantitative Methodology
General Tips to Fast-Track Your Quantitative MethodologyStatistics Solutions
 
Goverment 2.0: Social Media in the Age of Obama - Measuring Your Success
Goverment 2.0: Social Media in the Age of Obama - Measuring Your SuccessGoverment 2.0: Social Media in the Age of Obama - Measuring Your Success
Goverment 2.0: Social Media in the Age of Obama - Measuring Your SuccessLaura Lee Dooley
 

Similar to Lexicon-Based Sentiment Analysis at GHC 2014 (20)

7 notes
7 notes7 notes
7 notes
 
Support Optimization
Support OptimizationSupport Optimization
Support Optimization
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Top 5 Survey Data Analysis Software .pptx
Top 5 Survey Data Analysis Software .pptxTop 5 Survey Data Analysis Software .pptx
Top 5 Survey Data Analysis Software .pptx
 
Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review
 
A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity   A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016
 
Sentiment Analysis for SEO
Sentiment Analysis for SEOSentiment Analysis for SEO
Sentiment Analysis for SEO
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Interestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsInterestingness of articles using twitter sentiments
Interestingness of articles using twitter sentiments
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsTelling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
 
Interestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsInterestingness of articles using twitter sentiments
Interestingness of articles using twitter sentiments
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
 
I Am Data-driven and So Are You
I Am Data-driven and So Are YouI Am Data-driven and So Are You
I Am Data-driven and So Are You
 
Manisha_Microsoft_GlobalBootCamp_2019
Manisha_Microsoft_GlobalBootCamp_2019Manisha_Microsoft_GlobalBootCamp_2019
Manisha_Microsoft_GlobalBootCamp_2019
 
Webinar - How to Choose and Use Salary Data
Webinar - How to Choose and Use Salary DataWebinar - How to Choose and Use Salary Data
Webinar - How to Choose and Use Salary Data
 
General Tips to Fast-Track Your Quantitative Methodology
General Tips to Fast-Track Your Quantitative MethodologyGeneral Tips to Fast-Track Your Quantitative Methodology
General Tips to Fast-Track Your Quantitative Methodology
 
Goverment 2.0: Social Media in the Age of Obama - Measuring Your Success
Goverment 2.0: Social Media in the Age of Obama - Measuring Your SuccessGoverment 2.0: Social Media in the Age of Obama - Measuring Your Success
Goverment 2.0: Social Media in the Age of Obama - Measuring Your Success
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单nscud
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxzahraomer517
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 

Recently uploaded (20)

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Lexicon-Based Sentiment Analysis at GHC 2014

  • 1. 2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Bo-Hyun Kim, Sr. Software Engineer HP Big Data Business Unit Oct 10th, 2014 #GHC14 2014
  • 2. 2014 What to Expect  Sentiment Analysis − What is it? − Why is it interesting? − How HP Vertica Pulse works − Achieving greater accuracy − Different point of view using the most- mentioned word tree
  • 3. 2014 What I Expect  A 5-star rating on GHC app  I just expect you to enjoy and learn!
  • 4. 2014 Sentiment Analysis  In plain English − the process of automatically detecting if a text segment contains emotional or opinionated content and determining its polarity (e.g., “thumbs up” or “thumbs down”), is a field of research that has received significant attention in recent years, both in academia and in industry. [Wright, 2009]
  • 5. 2014 Gimme Examples!  Also known as: − Opinion Mining − Text Mining  Determine people’s general opinion − “I just got a new car, and I’m loving it ” − “My new car isn’t as fast as I thought.”
  • 6. 2014 Why are we interested?  Increasing(every minute!) web usage − Articles − Blogs − Comments  Power of Social Media − Online Shopping − Customer Reviews − Recommended products on Amazon − How other people feel about the product
  • 10. 2014 How to Analyze?  Lexicon-based approach – HP Labs [Zhang et. al. 2011]  Choose a product, person, event, organization, or topic [Hu and Liu, 2004] to analyze the opinion  Determine the Semantic Orientation score of opinion lexicons Word Semantic Orientation Value Fabulous +3 Good +1 Bad -1 Nasty -3
  • 11. 2014 Sentiment Scoring  Input: text or sentence  Output: For each attribute or entity, generates a sentiment score ranging from -1 to 1 − -1: Negative sentiment − 0: Neutral sentiment − 1: Positive sentiment  Entity-level lexicon-based sentiment scoring
  • 12. 2014 Limitation  Semantic Orientation value(‘missed’) = -1  Gives more weight to the closely located word  Accuracy can suffer
  • 13. 2014 Improve accuracy  Accuracy is what we strive for!  More robust pre-processing − Prune data to fit for different types of user opinion (e.g. Twitter vs. YouTube comments)  Naïve Bayes Classifier Training  Tune accordingly
  • 14. 2014 Data Set  Test dataset − Stanford students collected − In 2009 − Over 3 million tweets with tested score − Analyzed 3500 tweets  Collected dataset − HP Vertica Pulse Twitter Connector − In 2014 − Total of 1.2 million tweets over 30 days
  • 15. 2014 Data Pruning  Remove − Job postings • #job, #jobs, #tweetmyjob − Links • http://this.is/nogood − Duplicates − Twitter specific characters • RT, @, # − Emoticons • I hate my life :-), sarcasm is wide-spread disease  After pruning − ~287000 tweets, 24% of the 1.2 million tweets
  • 16. 2014 Naïve Bayes Classifier  Supervised learning − Probabilistic classifier based on Bayes’ theorem − Requires a small amount of data − Assumes the presence/absence of a particular feature of a class is unrelated to the presence/absence of any other feature − Classifying the object based on its included features 𝑃(𝐶𝑗|𝐷) = 𝑃 𝐷 𝐶𝑗 𝑃(𝐶𝑗) 𝑃(𝐷) − Open source found at [nltk.org]
  • 17. 2014 Naïve Bayes Classifier  Results: − Final accuracy : 0.788
  • 18. 2014 Tuning Pulse  Positive words  Negative words  Neutral words  White lists  Stop words  Synonym mappings
  • 19. 2014 Accuracy Comparison  Sentiment scores generated for each phase Keyword Ideal Original Pruning Training Tuning Healthcare -0.1515 -0.0333 -0.0833 -0.1 -0.125 Obama 0.308 0.0944 0.1535 0.1535 0.1842
  • 20. 2014 Trend/Targeted Analysis  Targeted dataset analysis can help improve accuracy  Identify the most-mentioned words − Use the most-recurrent words to narrow the scope of analysis  Find new trends − Government healthcare (2009) vs. Obamacare (2014)  Are we looking at the targeted data? − “Solve healthcare challenges with technology!” − “Healthcare After ObamaCare” − “Get affordable healthcare at HealthCare.gov”
  • 21. 2014 Generating Tree  Increase the relevancy of sentiment score by running the sentiment analysis on the entity, as well as on the most-recurrent words to identify: − Homonyms that machines do not understand − More accurate scores based on user interest  Generate tree using Text Search − Merge stemmer words e.g. query, queries, querying… − Lucene - apache open source
  • 23. 2014 Thank you  Questions? bohyun@hp.com bohyun.j.kim@gmail.com Many thanks to*: Tim Donar, Solution Engineer Beth Favini, Tech Pubs Sr. Manager Judith Plummer, Tech Pubs Editor in Chief * In alphabetical order
  • 24. 2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org

Editor's Notes

  1. Specifically clarify NLP -> part of it.
  2. Mention twitter’s limitation  spend less time and effort to understand the specific dataset Twitter – full outer join
  3. This is the last slide and must be included in the slide deck