SlideShare a Scribd company logo
Semi Automatic Sentiment Analysis
Results from a case study in Brazilian Portuguese web 2.0 sites

            Gleicon Moraes, Marco Aurélio Gerosa
           gleicon@gmail.com, gerosa@ime.usp.br
Introduction

•  Popular Web 2.0 applications are based on social
   networking: Facebook, Twitter, Orkut, Flickr, LinkedIn

•  Status messages, user information, wall posts, like/unline
   votes, scraps, recommendations are created and exchanged
   between users.

•  Symetric and Assymetric relationships broadcast these
   messages between friends (direct connections) and friends
   of friends.

•  Sentiment and opinions might be objective (up/down votes,
   recommendations) or subjective (free text)



                      Gleicon Moraes, Marco Aurélio Gerosa   2/20
Sentiment Classification


•  Find out what users in a social network think about
   product, tendency or brand.

•  Compute or help to compute the Return of Investment of a
   marketing campaign

•  Create or compose product and services recommendations
   to other users

•  To measure user satisfaction and experience about a
   service.



                     Gleicon Moraes, Marco Aurélio Gerosa   3/20
Goals

•  Opinion mining / subjectivity and sentiment analysis review
   [1]

•  Automate opinion classification (tweet, scrap, message, wall
   post) using Machine Learning and Information Retrieval
   techniques.

•  To apply a Bayesian filter (and also try a SVM classifier) to
   identify Positive and Negative sentiment on brazilian
   Portuguese texts.

•  To build a corpus to train and test the classifiers

•  To find out how to measure the filter efficiency.
[1] Pang e Lee - Opinion Mining and Sentiment Analysis
                                             Gleicon Moraes, Marco Aurélio Gerosa   4/20
Related work



•  Thumbs Up? Sentiment Classification using Machine Learning
    Techniques: Bayesian filter, Maximum Entropy filter and SVM filter. Training
    was made with Movielens dataset, splitting between 70% corpus to training and
    30% to test. This corpus is already marked as positive and negative. Conclusion
    was that sarcasm on opinions made it difficult to classify the sentiments. There
    was no smaller text classification (e.g. A tweet/140 chars) and feedback with
    outside text to the classifiers. [1]




[1] Pang, B. Lee L., Cornell University, Vaithyanathan S, IBM: Thumbs Up? Sentiment Classification using Machine Learning
Techniques

                                           Gleicon Moraes, Marco Aurélio Gerosa                                5/20
Related work


•  Content-based book Recommendation Using Learning for Text
    Categorization and information extracted from the internet to train a classifier,
    with a database per user. The combination between collaborative filtering and
    content filtering complete each other and help improve the results. [1][2]




[1] Mooney R. J., Roy L., “Content-Based Book Recommendation Using Learning for Text Categorization” (Proceedings of
ACM Conference on Digital Libraries, 2000)
[2] Dˇzeroski S., Zenko B. “Is Combining Classifiers Better than Selecting the Best One?”

                                          Gleicon Moraes, Marco Aurélio Gerosa                              6/20
Semi-Automatic Sentiment Classification


•  Trained Bayesian Filter on two categories: “positive” and “negative”

•  Feedback feature so false positives and false negatives could be
   trained back to improve the filter

•  Problem: There is not a brazilian portuguese data matching text to
   sentiment to do the initial classificator training.

•  Problem: Text composition varies between social networks and
   groups within these networks. Feeding back data to keep the
   classificator database updated is fundamental




                        Gleicon Moraes, Marco Aurélio Gerosa    7/20
Semi-Automatic Sentiment Classification


•  English language training corpus uses movie reviews in
   most papers, associated with ratings to tell what that text
   block express [1]

•  An initial training corpus was made using consumer review
   data from Brazilian websites like iVox, ReclameAqui,
   opiniões do MercadoLivre

•  After scrapping each opinion and its rating (stars, rating, or
   positive/negative indication), stored it on folders ranging
   from 0.0 to 5.0, each opinion a file inside the proper folder

[1] MovieLens dataset: http://www.grouplens.org/node/73



                                      Gleicon Moraes, Marco Aurélio Gerosa   8/20
Training composition findings


•  Number of words in negative opinions is bigger than on
   positive opinions: 67.575 words in 712 positive opinions
   versus 81.747 words in 507 negative opinions.

•  Distribution of reviews between minimum and maximum
   ratings: more opinions on the extremes (0.0 to 0.5 and 4.5
   to 5.0).




                      Gleicon Moraes, Marco Aurélio Gerosa   9/20
Composição da base de treinamento - iVox




               Gleicon Moraes, Marco Aurélio Gerosa   10/22
Domain


•  Language domain varies between communities/sites




                        Gleicon Moraes, Marco Aurélio Gerosa   11/20
Opinion Sample (Mercadolivre)


positivo (rating 5):

"Este alto-falante faz o baile tremer... comprei para montar uma mini-
saveiro”

negativo (rating 1):

"Apesar de custar muito barato recomendo economizar e comprar
falantes de marcas conhecidas. Bravox, Selenium.
O produto parece recondicionado, e não tem 90Wats nem na china,
meu triaxial Pionner de 60Wats aquenta muito mais grave que esse
Unlike.
Não faça besteira economize mais R$60,00 e compre um Kit 2 vias
Selenium ou até Sony ou Bomber que custa quase o mesmo aqui no
Mercado Livre"

                         Gleicon Moraes, Marco Aurélio Gerosa   12/20
Opinion Sample (iVox)


positivo (rating 5.0):

"Economica não tem Adquiri uma web.evo Sundown,à moto é bonita,gostei tanto
da Sundown que adquiri mais uma moto Sundow a hunter 90cc. estou com 2
motos e estou muito satisfeito. Quanto ao pessoal da grappa, todos sem exceção
sempre bem atenciósos comigo; só tenho a agradecer. "

negativo (rating 1):

"Contra Todas Não sei o motivo de sua defesa a esta empresa, pois fui
enganado a pouco tempo e o engraçado é que liguei para reclamar,
bem na hora que o vendedor estava enganando outra pessoa, por um
deslize do mesmo o cliente verificou o numero e me ligou dizendo que
também havia sido enganado. Entramos com denúncia conjunta na
DECON do DF. Razoável Muito Ruim Razoável Muito Ruim"


                           Gleicon Moraes, Marco Aurélio Gerosa       13/20
Opinion Sample (Reclame Aqui)


positivo:

"Olá, estou passando apenas para parabenizar ao ótimo e sério trabalho da
equipe do site reclameaqui.com.br, pois já fui atendido em duas ocasiões
reclamadas no site e foi algo bem melhor e mais rápido do que partir para outras
atitudes. Parabéns e que cada vez mais possamos ter meios iguais para
podermos agilizar o processo de negociação.
Obrigado,"

negativo:

"Fiz 2 reclamações contra a MOTOROLA DO BRASIL por propaganda
enganosa em seu site www.motorola.com.br sobre o aparelho V3m que
no site diz ACOPMPANHA cartão enquanto no meu aparelho nao veio
NADA !!! Eles me ligaram e tiram o deles da reta dizendo que a culpa é
da VIVO ! MAis perai quem faz o aparelho nao é eles ??? A VIVO so
revende !!!! Ah MOTOROLA POR FAVOR NE !!!!! QUERO MEU
CARTAO !!!"              Gleicon Moraes, Marco Aurélio Gerosa  14/20
Domain


•  Language Domain [1]: "go read a book” has different meaning related
   to each social network. In a book related network might be a positive
   meaning. In others might mean a negative sentiment.

•  Feeding back data also helps to keep the database updated with new
   slangs and combinations that also might cover sarcasm expressions.

•  Events like world cup and television shows might introduce new words
   and expressions.




[1] Pang e Lee - Opinion Mining and Sentiment Analysis
                                             Gleicon Moraes, Marco Aurélio Gerosa   15/20
Training



•  Split the database between negative (rating: 0.0) and positive (rating
   5.0). Later steps added ratings 4.5, 4.0 to positive while negative rating
   kept the same.

•  Training/Classifying applied on raw data and on data processed a
   pipeline of taking out stop words and extracting the stem of remaining
   words

•  Raw data biased towards negative sentiment, processed data biased
   towards positive sentiments.




                           Gleicon Moraes, Marco Aurélio Gerosa      16/20
Results – raw data



 iVox                                           ReclameAqui          False results
 Ratings                 Negative/Positive      Negative Positive    Negative Positive
 No training             No messages            1635     268         0           0
 0.0 e 5.0               506/720                1634          6      262       1
 0.0 e 4.5 + 5.0         506/873                1587          99     169       48
 0.0 e 4.0 + 4.5 + 5.0   506/973                1365          165    105       270




                              Gleicon Moraes, Marco Aurélio Gerosa                   17/20
Results – filtered data



 iVox                                              ReclameAqui         False results
 Ratings                   Negative/Positive       Negative Positive   Negative Positive
 No training               No messages             1635     268        0           0
 0.0 and 5.0               506/720                 1635          0     268       0
 0.0 and 4.5 + 5.0         506/873                 0             261   0         1627
 0.0 and 4.0 + 4.5 + 5.0   506/973                 0             268   0         1635




                                Gleicon Moraes, Marco Aurélio Gerosa                 18/20
Measuring efficiency


•  Metrics: Accuracy, Precision Recall

•  Token extraction: words (bag of words) and bigrams.

•  Test between languages and domain: trained and tested the same
   classifiers and extractors with the Movielens dataset [1]




[1] The MovieLens dataset: http://www.grouplens.org/node/73




                                         Gleicon Moraes, Marco Aurélio Gerosa   19/20
Efficiency

Movie Review (en)
 Feature Extractor     Accuracy         Positive          Negative    Positive   Negative
                                        Precision         Precision   Recall     Recall
 Bag of Words          0.7280           0.6516            0.9597      0.9800     0.4760
 Bigrams               0.8240           0.7613            0.9263      0.9440     0.7040



Consumer Opinion (pt_br)
 Feature Extractor     Accuracy         Positive          Negative    Positive   Negative
                                        Precision         Precision   Recall     Recall
 Bag of Words          0.5984           1.0000            0.5100      0.3099     1.000
 Bigrams               0.7049           1.0000            0.5862      0.4930     1.000




                           Gleicon Moraes, Marco Aurélio Gerosa                   20/20
Conclusion


•  Consumer review database helped on initial training.

•  O keep the messages as is helps makes the database richer with
   different forms of the same expression

•  Token extraction influences the end result

•  Feeding back helps to keep the database up to date

•  To combine classifiers helps the end results and the precision

•  Contribution: Brazilian portuguese database and scripts used to extract
   data and to reproduce the experiment at: https://github.com/gleicon/
   sentiment_analysis




                           Gleicon Moraes, Marco Aurélio Gerosa     21/20

More Related Content

Similar to Semi Automatic Sentiment Analysis

Metrics that Matter
Metrics that MatterMetrics that Matter
Metrics that Matter
Jeremy Horn
 
Longo
LongoLongo
Netbase AMA Sentiment Analysis Presentation
Netbase AMA Sentiment Analysis PresentationNetbase AMA Sentiment Analysis Presentation
Netbase AMA Sentiment Analysis Presentation
NetBase
 
Social Learning Strategy V2
Social Learning Strategy V2Social Learning Strategy V2
Social Learning Strategy V2
David Wilkins
 
Kellogg Video Essay Question List. Online assignment writing service.
Kellogg Video Essay Question List. Online assignment writing service.Kellogg Video Essay Question List. Online assignment writing service.
Kellogg Video Essay Question List. Online assignment writing service.
Ashley Opokuaa
 
Having Trouble Writing College Essay. How To Write An Exemplification ...
Having Trouble Writing College Essay. How To Write An Exemplification ...Having Trouble Writing College Essay. How To Write An Exemplification ...
Having Trouble Writing College Essay. How To Write An Exemplification ...
Rebecca Bordes
 
Bundledarrows160 bit.ly/teamcaptainsguild
Bundledarrows160 bit.ly/teamcaptainsguildBundledarrows160 bit.ly/teamcaptainsguild
Bundledarrows160 bit.ly/teamcaptainsguild
shadowboxingtv
 
ADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALING
ADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALINGADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALING
ADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALING
Human Capital Media
 
Growth hacking - the Referrals (2)
Growth hacking - the Referrals (2)Growth hacking - the Referrals (2)
Growth hacking - the Referrals (2)
Tomislav Rozman
 
MIS 2101 Presentation
MIS 2101 PresentationMIS 2101 Presentation
MIS 2101 Presentation
Logan Peterson
 
Interview with pam morris
Interview with pam morrisInterview with pam morris
Interview with pam morris
Computer Aid, Inc
 
G-51-Collecting-Effective-Data-in-Counseling.pptx
G-51-Collecting-Effective-Data-in-Counseling.pptxG-51-Collecting-Effective-Data-in-Counseling.pptx
G-51-Collecting-Effective-Data-in-Counseling.pptx
sudhashinithiruchelv
 
RE-EVALUATING YOUR ORGANIZATION’S SKILL GAPS
RE-EVALUATING YOUR ORGANIZATION’S SKILL GAPSRE-EVALUATING YOUR ORGANIZATION’S SKILL GAPS
RE-EVALUATING YOUR ORGANIZATION’S SKILL GAPS
Human Capital Media
 
Gallup Q12 index survey
Gallup Q12 index surveyGallup Q12 index survey
Gallup Q12 index survey
Taha Momin
 
5 things I learned about email relevancy
5 things I learned about email relevancy5 things I learned about email relevancy
5 things I learned about email relevancy
Becs Kemm
 
May 20, 2018: Colorado Coach Connection
May 20, 2018: Colorado Coach Connection May 20, 2018: Colorado Coach Connection
May 20, 2018: Colorado Coach Connection
ICF Colorado
 
360 performance reviews
360 performance reviews360 performance reviews
360 performance reviews
waynerooney369
 
Gartner webinar social media analytics 23.10.2014
Gartner webinar social media analytics 23.10.2014Gartner webinar social media analytics 23.10.2014
Gartner webinar social media analytics 23.10.2014
Irene Ventayol
 
DRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKES
DRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKESDRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKES
DRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKES
Human Capital Media
 
Etude digital media planning 2010
Etude digital media planning 2010Etude digital media planning 2010
Etude digital media planning 2010
tdesaintmartin
 

Similar to Semi Automatic Sentiment Analysis (20)

Metrics that Matter
Metrics that MatterMetrics that Matter
Metrics that Matter
 
Longo
LongoLongo
Longo
 
Netbase AMA Sentiment Analysis Presentation
Netbase AMA Sentiment Analysis PresentationNetbase AMA Sentiment Analysis Presentation
Netbase AMA Sentiment Analysis Presentation
 
Social Learning Strategy V2
Social Learning Strategy V2Social Learning Strategy V2
Social Learning Strategy V2
 
Kellogg Video Essay Question List. Online assignment writing service.
Kellogg Video Essay Question List. Online assignment writing service.Kellogg Video Essay Question List. Online assignment writing service.
Kellogg Video Essay Question List. Online assignment writing service.
 
Having Trouble Writing College Essay. How To Write An Exemplification ...
Having Trouble Writing College Essay. How To Write An Exemplification ...Having Trouble Writing College Essay. How To Write An Exemplification ...
Having Trouble Writing College Essay. How To Write An Exemplification ...
 
Bundledarrows160 bit.ly/teamcaptainsguild
Bundledarrows160 bit.ly/teamcaptainsguildBundledarrows160 bit.ly/teamcaptainsguild
Bundledarrows160 bit.ly/teamcaptainsguild
 
ADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALING
ADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALINGADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALING
ADDRESS THE SKILLS GAP WITH MICRO-CREDENTIALING
 
Growth hacking - the Referrals (2)
Growth hacking - the Referrals (2)Growth hacking - the Referrals (2)
Growth hacking - the Referrals (2)
 
MIS 2101 Presentation
MIS 2101 PresentationMIS 2101 Presentation
MIS 2101 Presentation
 
Interview with pam morris
Interview with pam morrisInterview with pam morris
Interview with pam morris
 
G-51-Collecting-Effective-Data-in-Counseling.pptx
G-51-Collecting-Effective-Data-in-Counseling.pptxG-51-Collecting-Effective-Data-in-Counseling.pptx
G-51-Collecting-Effective-Data-in-Counseling.pptx
 
RE-EVALUATING YOUR ORGANIZATION’S SKILL GAPS
RE-EVALUATING YOUR ORGANIZATION’S SKILL GAPSRE-EVALUATING YOUR ORGANIZATION’S SKILL GAPS
RE-EVALUATING YOUR ORGANIZATION’S SKILL GAPS
 
Gallup Q12 index survey
Gallup Q12 index surveyGallup Q12 index survey
Gallup Q12 index survey
 
5 things I learned about email relevancy
5 things I learned about email relevancy5 things I learned about email relevancy
5 things I learned about email relevancy
 
May 20, 2018: Colorado Coach Connection
May 20, 2018: Colorado Coach Connection May 20, 2018: Colorado Coach Connection
May 20, 2018: Colorado Coach Connection
 
360 performance reviews
360 performance reviews360 performance reviews
360 performance reviews
 
Gartner webinar social media analytics 23.10.2014
Gartner webinar social media analytics 23.10.2014Gartner webinar social media analytics 23.10.2014
Gartner webinar social media analytics 23.10.2014
 
DRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKES
DRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKESDRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKES
DRIVING STRATEGY: HOW TO AVOID THE TOP THREE MISTAKES
 
Etude digital media planning 2010
Etude digital media planning 2010Etude digital media planning 2010
Etude digital media planning 2010
 

More from Gleicon Moraes

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebram
Gleicon Moraes
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
Gleicon Moraes
 
API Gateway report
API Gateway reportAPI Gateway report
API Gateway report
Gleicon Moraes
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
Gleicon Moraes
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
Gleicon Moraes
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs Scalability
Gleicon Moraes
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
Gleicon Moraes
 
Patterns of fail
Patterns of failPatterns of fail
Patterns of fail
Gleicon Moraes
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Gleicon Moraes
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
Gleicon Moraes
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
Gleicon Moraes
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
Gleicon Moraes
 
NoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsNoSQL and SQL Anti Patterns
NoSQL and SQL Anti Patterns
Gleicon Moraes
 
Redis
RedisRedis

More from Gleicon Moraes (14)

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebram
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
 
API Gateway report
API Gateway reportAPI Gateway report
API Gateway report
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs Scalability
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
Patterns of fail
Patterns of failPatterns of fail
Patterns of fail
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
 
NoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsNoSQL and SQL Anti Patterns
NoSQL and SQL Anti Patterns
 
Redis
RedisRedis
Redis
 

Recently uploaded

Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 

Recently uploaded (20)

Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 

Semi Automatic Sentiment Analysis

  • 1. Semi Automatic Sentiment Analysis Results from a case study in Brazilian Portuguese web 2.0 sites Gleicon Moraes, Marco Aurélio Gerosa gleicon@gmail.com, gerosa@ime.usp.br
  • 2. Introduction •  Popular Web 2.0 applications are based on social networking: Facebook, Twitter, Orkut, Flickr, LinkedIn •  Status messages, user information, wall posts, like/unline votes, scraps, recommendations are created and exchanged between users. •  Symetric and Assymetric relationships broadcast these messages between friends (direct connections) and friends of friends. •  Sentiment and opinions might be objective (up/down votes, recommendations) or subjective (free text) Gleicon Moraes, Marco Aurélio Gerosa 2/20
  • 3. Sentiment Classification •  Find out what users in a social network think about product, tendency or brand. •  Compute or help to compute the Return of Investment of a marketing campaign •  Create or compose product and services recommendations to other users •  To measure user satisfaction and experience about a service. Gleicon Moraes, Marco Aurélio Gerosa 3/20
  • 4. Goals •  Opinion mining / subjectivity and sentiment analysis review [1] •  Automate opinion classification (tweet, scrap, message, wall post) using Machine Learning and Information Retrieval techniques. •  To apply a Bayesian filter (and also try a SVM classifier) to identify Positive and Negative sentiment on brazilian Portuguese texts. •  To build a corpus to train and test the classifiers •  To find out how to measure the filter efficiency. [1] Pang e Lee - Opinion Mining and Sentiment Analysis Gleicon Moraes, Marco Aurélio Gerosa 4/20
  • 5. Related work •  Thumbs Up? Sentiment Classification using Machine Learning Techniques: Bayesian filter, Maximum Entropy filter and SVM filter. Training was made with Movielens dataset, splitting between 70% corpus to training and 30% to test. This corpus is already marked as positive and negative. Conclusion was that sarcasm on opinions made it difficult to classify the sentiments. There was no smaller text classification (e.g. A tweet/140 chars) and feedback with outside text to the classifiers. [1] [1] Pang, B. Lee L., Cornell University, Vaithyanathan S, IBM: Thumbs Up? Sentiment Classification using Machine Learning Techniques Gleicon Moraes, Marco Aurélio Gerosa 5/20
  • 6. Related work •  Content-based book Recommendation Using Learning for Text Categorization and information extracted from the internet to train a classifier, with a database per user. The combination between collaborative filtering and content filtering complete each other and help improve the results. [1][2] [1] Mooney R. J., Roy L., “Content-Based Book Recommendation Using Learning for Text Categorization” (Proceedings of ACM Conference on Digital Libraries, 2000) [2] Dˇzeroski S., Zenko B. “Is Combining Classifiers Better than Selecting the Best One?” Gleicon Moraes, Marco Aurélio Gerosa 6/20
  • 7. Semi-Automatic Sentiment Classification •  Trained Bayesian Filter on two categories: “positive” and “negative” •  Feedback feature so false positives and false negatives could be trained back to improve the filter •  Problem: There is not a brazilian portuguese data matching text to sentiment to do the initial classificator training. •  Problem: Text composition varies between social networks and groups within these networks. Feeding back data to keep the classificator database updated is fundamental Gleicon Moraes, Marco Aurélio Gerosa 7/20
  • 8. Semi-Automatic Sentiment Classification •  English language training corpus uses movie reviews in most papers, associated with ratings to tell what that text block express [1] •  An initial training corpus was made using consumer review data from Brazilian websites like iVox, ReclameAqui, opiniões do MercadoLivre •  After scrapping each opinion and its rating (stars, rating, or positive/negative indication), stored it on folders ranging from 0.0 to 5.0, each opinion a file inside the proper folder [1] MovieLens dataset: http://www.grouplens.org/node/73 Gleicon Moraes, Marco Aurélio Gerosa 8/20
  • 9. Training composition findings •  Number of words in negative opinions is bigger than on positive opinions: 67.575 words in 712 positive opinions versus 81.747 words in 507 negative opinions. •  Distribution of reviews between minimum and maximum ratings: more opinions on the extremes (0.0 to 0.5 and 4.5 to 5.0). Gleicon Moraes, Marco Aurélio Gerosa 9/20
  • 10. Composição da base de treinamento - iVox Gleicon Moraes, Marco Aurélio Gerosa 10/22
  • 11. Domain •  Language domain varies between communities/sites Gleicon Moraes, Marco Aurélio Gerosa 11/20
  • 12. Opinion Sample (Mercadolivre) positivo (rating 5): "Este alto-falante faz o baile tremer... comprei para montar uma mini- saveiro” negativo (rating 1): "Apesar de custar muito barato recomendo economizar e comprar falantes de marcas conhecidas. Bravox, Selenium. O produto parece recondicionado, e não tem 90Wats nem na china, meu triaxial Pionner de 60Wats aquenta muito mais grave que esse Unlike. Não faça besteira economize mais R$60,00 e compre um Kit 2 vias Selenium ou até Sony ou Bomber que custa quase o mesmo aqui no Mercado Livre" Gleicon Moraes, Marco Aurélio Gerosa 12/20
  • 13. Opinion Sample (iVox) positivo (rating 5.0): "Economica não tem Adquiri uma web.evo Sundown,à moto é bonita,gostei tanto da Sundown que adquiri mais uma moto Sundow a hunter 90cc. estou com 2 motos e estou muito satisfeito. Quanto ao pessoal da grappa, todos sem exceção sempre bem atenciósos comigo; só tenho a agradecer. " negativo (rating 1): "Contra Todas Não sei o motivo de sua defesa a esta empresa, pois fui enganado a pouco tempo e o engraçado é que liguei para reclamar, bem na hora que o vendedor estava enganando outra pessoa, por um deslize do mesmo o cliente verificou o numero e me ligou dizendo que também havia sido enganado. Entramos com denúncia conjunta na DECON do DF. Razoável Muito Ruim Razoável Muito Ruim" Gleicon Moraes, Marco Aurélio Gerosa 13/20
  • 14. Opinion Sample (Reclame Aqui) positivo: "Olá, estou passando apenas para parabenizar ao ótimo e sério trabalho da equipe do site reclameaqui.com.br, pois já fui atendido em duas ocasiões reclamadas no site e foi algo bem melhor e mais rápido do que partir para outras atitudes. Parabéns e que cada vez mais possamos ter meios iguais para podermos agilizar o processo de negociação. Obrigado," negativo: "Fiz 2 reclamações contra a MOTOROLA DO BRASIL por propaganda enganosa em seu site www.motorola.com.br sobre o aparelho V3m que no site diz ACOPMPANHA cartão enquanto no meu aparelho nao veio NADA !!! Eles me ligaram e tiram o deles da reta dizendo que a culpa é da VIVO ! MAis perai quem faz o aparelho nao é eles ??? A VIVO so revende !!!! Ah MOTOROLA POR FAVOR NE !!!!! QUERO MEU CARTAO !!!" Gleicon Moraes, Marco Aurélio Gerosa 14/20
  • 15. Domain •  Language Domain [1]: "go read a book” has different meaning related to each social network. In a book related network might be a positive meaning. In others might mean a negative sentiment. •  Feeding back data also helps to keep the database updated with new slangs and combinations that also might cover sarcasm expressions. •  Events like world cup and television shows might introduce new words and expressions. [1] Pang e Lee - Opinion Mining and Sentiment Analysis Gleicon Moraes, Marco Aurélio Gerosa 15/20
  • 16. Training •  Split the database between negative (rating: 0.0) and positive (rating 5.0). Later steps added ratings 4.5, 4.0 to positive while negative rating kept the same. •  Training/Classifying applied on raw data and on data processed a pipeline of taking out stop words and extracting the stem of remaining words •  Raw data biased towards negative sentiment, processed data biased towards positive sentiments. Gleicon Moraes, Marco Aurélio Gerosa 16/20
  • 17. Results – raw data iVox ReclameAqui False results Ratings Negative/Positive Negative Positive Negative Positive No training No messages 1635 268 0 0 0.0 e 5.0 506/720 1634 6 262 1 0.0 e 4.5 + 5.0 506/873 1587 99 169 48 0.0 e 4.0 + 4.5 + 5.0 506/973 1365 165 105 270 Gleicon Moraes, Marco Aurélio Gerosa 17/20
  • 18. Results – filtered data iVox ReclameAqui False results Ratings Negative/Positive Negative Positive Negative Positive No training No messages 1635 268 0 0 0.0 and 5.0 506/720 1635 0 268 0 0.0 and 4.5 + 5.0 506/873 0 261 0 1627 0.0 and 4.0 + 4.5 + 5.0 506/973 0 268 0 1635 Gleicon Moraes, Marco Aurélio Gerosa 18/20
  • 19. Measuring efficiency •  Metrics: Accuracy, Precision Recall •  Token extraction: words (bag of words) and bigrams. •  Test between languages and domain: trained and tested the same classifiers and extractors with the Movielens dataset [1] [1] The MovieLens dataset: http://www.grouplens.org/node/73 Gleicon Moraes, Marco Aurélio Gerosa 19/20
  • 20. Efficiency Movie Review (en) Feature Extractor Accuracy Positive Negative Positive Negative Precision Precision Recall Recall Bag of Words 0.7280 0.6516 0.9597 0.9800 0.4760 Bigrams 0.8240 0.7613 0.9263 0.9440 0.7040 Consumer Opinion (pt_br) Feature Extractor Accuracy Positive Negative Positive Negative Precision Precision Recall Recall Bag of Words 0.5984 1.0000 0.5100 0.3099 1.000 Bigrams 0.7049 1.0000 0.5862 0.4930 1.000 Gleicon Moraes, Marco Aurélio Gerosa 20/20
  • 21. Conclusion •  Consumer review database helped on initial training. •  O keep the messages as is helps makes the database richer with different forms of the same expression •  Token extraction influences the end result •  Feeding back helps to keep the database up to date •  To combine classifiers helps the end results and the precision •  Contribution: Brazilian portuguese database and scripts used to extract data and to reproduce the experiment at: https://github.com/gleicon/ sentiment_analysis Gleicon Moraes, Marco Aurélio Gerosa 21/20