SlideShare a Scribd company logo
1 of 68
Download to read offline
TwiSent: A Multi-Stage System
for Analyzing Sentiment in Twitter
Subhabrata Mukherjee, Akshat Malu, Balamurali A.R.
and Pushpak Bhattacharyya
Dept. of Computer Science and Engineering, IIT Bombay
21st ACM Conference on Information and Knowledge Management
CIKM 2012,
Hawai, Oct 29 - Nov 2, 2012
Social Media Analysis
2








Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black
homies......
Social Media Analysis
3
 Social media sites, like Twitter, generate around 250 million tweets daily







Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black
homies......
Social Media Analysis
4
 Social media sites, like Twitter, generate around 250 million tweets daily
 This information content could be leveraged to create applications that
have a social as well as an economic value






Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black
homies......
Social Media Analysis
5
 Social media sites, like Twitter, generate around 250 million tweets daily
 This information content could be leveraged to create applications that
have a social as well as an economic value
 Text limit of 140 characters per tweet makes Twitter a noisy medium
 Tweets have a poor syntactic and semantic structure
 Problems like slangs, ellipses, nonstandard vocabulary etc.



Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black
homies......
Social Media Analysis
6
 Social media sites, like Twitter, generate around 250 million tweets daily
 This information content could be leveraged to create applications that
have a social as well as an economic value
 Text limit of 140 characters per tweet makes Twitter a noisy medium
 Tweets have a poor syntactic and semantic structure
 Problems like slangs, ellipses, nonstandard vocabulary etc.
 Problem is compounded by increasing number of spams in Twitter
 Promotional tweets, bot-generated tweets, random links to websites etc.
 In fact Twitter contains around 40% tweets as pointless babble
Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black
homies......
TwiSent: Multi-Stage System
Architecture7
Tweet Fetcher Spam Filter Spell Checker
Polarity
Detector
Pragmatics
Handler
Dependency
Extractor
Tweets
Opinion
Spam Categorization and
Features8
1. Number of Words per Tweet
2. Average Word Length
3. Frequency of “?” and “!”
4. Frequency of Numeral
Characters
5. Frequency of hashtags
6. Frequency of @users
7. Extent of Capitalization
8. Frequency of the First POS Tag
9. Frequency of Foreign Words
10. Validity of First Word
11. Presence / Absence of links
12. Frequency of POS Tags
13. Strength of Character Elongation
14. Frequency of Slang Words
15. Average Positive and Negative
Sentiment of Tweets
Algorithm for Spam Filter
9
Input: Build an initial naive bayes classifier NB- C, using the tweet sets M (mixed unlabeled
set containing spams and non-spams) and P (labeled non-spam set)
1: Loop while classifier parameters change
2: for each tweet ti ∈ M do
3: Compute Pr[c1 |ti], Pr[c2 |ti] using the current NB //c1 - non-spam class , c2 - spam class
4: Pr[c2 |ti]= 1 - Pr[c1 |ti]
5: Update Pr[fi,k|c1] and Pr[c1] given the
probabilistically assigned class for all ti (Pr[c1|ti]).
(a new NB-C is being built in the process)
6: end for
7: end loop
Pr | =
Pr ∏ Pr ,
∑ Pr[ ] ∏ ( , | )
Categorization of Noisy Text
10
Spell-Checker Algorithm
11







Spell-Checker Algorithm
12
 Heuristically driven to resolve the identified errors with a minimum edit
distance based spell checker






Spell-Checker Algorithm
13
 Heuristically driven to resolve the identified errors with a minimum edit
distance based spell checker
 A normalize function takes care of Pragmatics and Number Homophones
 Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’




Spell-Checker Algorithm
14
 Heuristically driven to resolve the identified errors with a minimum edit
distance based spell checker
 A normalize function takes care of Pragmatics and Number Homophones
 Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’
 A vowel_dropped function takes care of the vowel dropping phenomenon



Spell-Checker Algorithm
15
 Heuristically driven to resolve the identified errors with a minimum edit
distance based spell checker
 A normalize function takes care of Pragmatics and Number Homophones
 Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’
 A vowel_dropped function takes care of the vowel dropping phenomenon
 The parameters offset and adv are determined empirically


Spell-Checker Algorithm
16
 Heuristically driven to resolve the identified errors with a minimum edit
distance based spell checker
 A normalize function takes care of Pragmatics and Number Homophones
 Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’
 A vowel_dropped function takes care of the vowel dropping phenomenon
 The parameters offset and adv are determined empirically
 Words are marked during normalization, to preserve their pragmatics
 happppyyyyy, normalized to hapy and thereafter spell-corrected to
happy, is marked so as to not lose its pragmatic content
Spell-Checker Algorithm
17
 Input: For string s, let S be the set of words in the lexicon starting with the
initial letter of s.
 /* Module Spell Checker */
 for each word w ∈ S do
 w’=vowel_dropped(w)
 s’=normalize(s)
 /*diff(s,w) gives difference of length between s and w*/
 if diff(s’ , w’) < offset then
 score[w]=min(edit_distance(s,w),edit_distance(s,
 w’), edit_distance(s’ , w))
 else
 score[w]=max_centinel
 end if
 end for
Spell-Checker Algorithm
Contd..18
 Sort score of each w in the Lexicon and retain the top m entries in suggestions(s) for the original string s
 for each t in suggestions(s) do
 edit1=edit_distance(s’ , s)
 /*t.replace(char1,char2) replaces all occurrences of char1 in the string t with char2*/
 edit2=edit_distance(t.replace( a , e), s’)
 edit3=edit_distance(t.replace(e , a), s’)
 edit4=edit_distance(t.replace(o , u), s’)
 edit5=edit_distance(t.replace(u , o), s’)
 edit6=edit_distance(t.replace(i , e), s’)
 edit7=edit_distance(t.replace(e, i), s’)
 count=overlapping_characters(t, s’)
 min_edit=
 min(edit1,edit2,edit3,edit4,edit5,edit6,edit7)
 if (min_edit ==0 or score[s’] == 0) then
 adv=-2 /* for exact match assign advantage score */
 else
 adv=0
 end if
 final_score[t]=min_edit+adv+score[w]-count;
 end for
 return t with minimum final_score;
Feature Specific Tweet Analysis
I have an ipod and it is a great buy
but I'm probably the only person that
dislikes the iTunes software.
Here the sentiment w.r.t ipod is
positive whereas that respect to
software is negative
Opinion Extraction Hypothesis
 “More closely related words come
together to express an opinion
about a feature”
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Adjective Modifier
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Adjective Modifier
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Adjective Modifier
Hypothesis Example
 “I want to use Samsung which is a great
product but am not so sure about using
Nokia”.
 Here “great” and “product” are related by an adjective modifier
relation, “product” and “Samsung” are related by a relative
clause modifier relation. Thus “great” and “Samsung” are
transitively related.
 Here “great” and “product” are more related to Samsung
than they are to Nokia
 Hence “great” and “product” come together to express an
opinion about the entity “Samsung” than about the entity
“Nokia”
Adjective Modifier
Relative Clause
Modifier
Example of a Review
 I have an ipod and it is a great buy but I'm
probably the only person that dislikes the
iTunes software.
Example of a Review
 I have an ipod and it is a great buy but I'm
probably the only person that dislikes the
iTunes software.
Example of a Review
 I have an ipod and it is a great buy but I'm
probably the only person that dislikes the
iTunes software.
Example of a Review
 I have an ipod and it is a great buy but I'm
probably the only person that dislikes the
iTunes software.
Example of a Review
 I have an ipod and it is a great buy but I'm
probably the only person that dislikes the
iTunes software.
Feature Extraction : Domain Info
Not Available








Feature Extraction : Domain Info
Not Available
 Initially, all the Nouns are treated as features and added to
the feature list F.







Feature Extraction : Domain Info
Not Available
 Initially, all the Nouns are treated as features and added to
the feature list F.
 F = { ipod, buy, person, software }






Feature Extraction : Domain Info
Not Available
 Initially, all the Nouns are treated as features and added to
the feature list F.
 F = { ipod, buy, person, software }
 Pruning the feature set
 Merge 2 features if they are strongly related




Feature Extraction : Domain Info
Not Available
 Initially, all the Nouns are treated as features and added to
the feature list F.
 F = { ipod, buy, person, software }
 Pruning the feature set
 Merge 2 features if they are strongly related
 “buy” merged with “ipod”, when target feature = “ipod”,
 “person, software” will be ignored.


Feature Extraction : Domain Info
Not Available
 Initially, all the Nouns are treated as features and added to
the feature list F.
 F = { ipod, buy, person, software }
 Pruning the feature set
 Merge 2 features if they are strongly related
 “buy” merged with “ipod”, when target feature = “ipod”,
 “person, software” will be ignored.
 “person” merged with “software”, when target feature =
“software”
 “ipod, buy” will be ignored.
Relations
 Direct Neighbor Relation
 Capture short range dependencies
 Any 2 consecutive words (such that none of them is a
StopWord) are directly related
 Consider a sentence S and 2 consecutive words .
 If , then they are directly related.
 Dependency Relation
 Capture long range dependencies
 Let Dependency_Relation be the list of significant
relations.
 Any 2 words wi and wj in S are directly related, if
s.t.
Graph representation
Graph
Algorithm
Algorithm Contd…
Algorithm Contd…
Clustering
7/23/2013
46
Clustering
7/23/2013
47
Clustering
7/23/2013
48
Clustering
7/23/2013
49
Clustering
7/23/2013
50
Clustering
7/23/2013
51
Clustering
7/23/2013
52
Clustering
7/23/2013
53
Clustering
7/23/2013
54
Pragmatics
55






Pragmatics
56
 Elongation of a word, repeating alphabets multiple times - Example:
happppyyyyyy, goooooood. More weightage is given by repeating
them twice





Pragmatics
57
 Elongation of a word, repeating alphabets multiple times - Example:
happppyyyyyy, goooooood. More weightage is given by repeating
them twice
 Use of Hashtags - #overrated, #worthawatch. More weightage is
given by repeating them thrice




Pragmatics
58
 Elongation of a word, repeating alphabets multiple times - Example:
happppyyyyyy, goooooood. More weightage is given by repeating
them twice
 Use of Hashtags - #overrated, #worthawatch. More weightage is
given by repeating them thrice
 Use of Emoticons -  (happy),  (sad)



Pragmatics
59
 Elongation of a word, repeating alphabets multiple times - Example:
happppyyyyyy, goooooood. More weightage is given by repeating
them twice
 Use of Hashtags - #overrated, #worthawatch. More weightage is
given by repeating them thrice
 Use of Emoticons -  (happy),  (sad)
 Use of Capitalization - where words are written in capital letters to
express intensity of user sentiments
 Full Caps - Example: I HATED that movie. More weightage is given by
repeating them thrice
 Partial Caps- Example: She is a Loving mom. More weightage is given
by repeating them twice
Spam Filter Evaluation
60
Tweets Total
Tweets
Correctly
Classified
Misclassified Precision
(%)
Recall
(%)
All 7007 3815 3192 54.45 55.24
Only spam 1993 1838 155 92.22 92.22
Only non-spam 5014 2259 2755 45.05 -
Tweets Total
Tweets
Correctly
Classified
Misclassified Precision
(%)
Recall
(%)
All 7007 5010 1997 71.50 54.29
Only spam 1993 1604 389 80.48 80.48
Only non-spam 5014 4227 787 84.30 -
2-Class
Classification
4-Class
Classification
TwiSent Evaluation
61
TwiSent Evaluation
62
Lexicon-based
Classification
TwiSent Evaluation
63
Lexicon-based
Classification
TwiSent Evaluation
64
Lexicon-based
Classification
Supervised
Classification
TwiSent Evaluation
65
Lexicon-based
Classification
System 2-class Accuracy Precision/Recall
C-Feel-It 50.8 53.16/72.96
TwiSent 68.19 64.92/69.37
Supervised
Classification
TwiSent Evaluation
66
Lexicon-based
Classification
System 2-class Accuracy Precision/Recall
C-Feel-It 50.8 53.16/72.96
TwiSent 68.19 64.92/69.37
Supervised
Classification
TwiSent Evaluation
67
Lexicon-based
Classification
System 2-class Accuracy Precision/Recall
C-Feel-It 50.8 53.16/72.96
TwiSent 68.19 64.92/69.37
Supervised
Classification
Ablation
Test
TwiSent Evaluation
68
Lexicon-based
Classification
System 2-class Accuracy Precision/Recall
C-Feel-It 50.8 53.16/72.96
TwiSent 68.19 64.92/69.37
Supervised
Classification
Ablation
Test
Module Removed Accuracy Statistical Significance
Confidence (%)
Entity-Specificity 65.14 95
Spell-Checker 64.2 99
Pragmatics Handler 63.51 99
Complete System 66.69 -

More Related Content

Similar to TwiSent: Multi-Stage System for Analyzing Sentiment in Twitter

Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviewsSubhabrata Mukherjee
 
Let's Make the PAIN Visible!
Let's Make the PAIN Visible!Let's Make the PAIN Visible!
Let's Make the PAIN Visible!Arty Starr
 
Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarPanos Gemos
 
Experiments in Reasoning
Experiments in ReasoningExperiments in Reasoning
Experiments in ReasoningAslam Khan
 
The tyranny of averages
The tyranny of averagesThe tyranny of averages
The tyranny of averagesPVS-Studio
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your VacationTJ Stalcup
 
Act War Company
Act War CompanyAct War Company
Act War Companyammar meza
 
Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Suresh Manian
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the RescueDomino Data Lab
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
How To Write An Anecdote In An Essay
How To Write An Anecdote In An EssayHow To Write An Anecdote In An Essay
How To Write An Anecdote In An EssayLaura Benitez
 
013 Research Paper Methods Section Social
013 Research Paper Methods Section Social013 Research Paper Methods Section Social
013 Research Paper Methods Section SocialNicole Williams
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your VacationTJ Stalcup
 
Building yourself with Python - Learn the Basics!!
Building yourself with Python - Learn the Basics!!Building yourself with Python - Learn the Basics!!
Building yourself with Python - Learn the Basics!!FRANKLINODURO
 

Similar to TwiSent: Multi-Stage System for Analyzing Sentiment in Twitter (20)

Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
Let's Make the PAIN Visible!
Let's Make the PAIN Visible!Let's Make the PAIN Visible!
Let's Make the PAIN Visible!
 
TotalSynch-PitchDeck
TotalSynch-PitchDeckTotalSynch-PitchDeck
TotalSynch-PitchDeck
 
Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum Radar
 
Query Understanding
Query UnderstandingQuery Understanding
Query Understanding
 
Experiments in Reasoning
Experiments in ReasoningExperiments in Reasoning
Experiments in Reasoning
 
The tyranny of averages
The tyranny of averagesThe tyranny of averages
The tyranny of averages
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your Vacation
 
Act War Company
Act War CompanyAct War Company
Act War Company
 
Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the Rescue
 
respect-estimates.pdf
respect-estimates.pdfrespect-estimates.pdf
respect-estimates.pdf
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
How To Write An Anecdote In An Essay
How To Write An Anecdote In An EssayHow To Write An Anecdote In An Essay
How To Write An Anecdote In An Essay
 
013 Research Paper Methods Section Social
013 Research Paper Methods Section Social013 Research Paper Methods Section Social
013 Research Paper Methods Section Social
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your Vacation
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Building yourself with Python - Learn the Basics!!
Building yourself with Python - Learn the Basics!!Building yourself with Python - Learn the Basics!!
Building yourself with Python - Learn the Basics!!
 

More from Subhabrata Mukherjee

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language ModelSubhabrata Mukherjee
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesSubhabrata Mukherjee
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Subhabrata Mukherjee
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesSubhabrata Mukherjee
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsSubhabrata Mukherjee
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Subhabrata Mukherjee
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelSubhabrata Mukherjee
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Subhabrata Mukherjee
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilaritySubhabrata Mukherjee
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...Subhabrata Mukherjee
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 

More from Subhabrata Mukherjee (17)

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
 
Fact Checking from Text
Fact Checking from TextFact Checking from Text
Fact Checking from Text
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product Profiles
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language Model
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review Communities
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News Communities
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health Forums
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic Model
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

TwiSent: Multi-Stage System for Analyzing Sentiment in Twitter

  • 1. TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter Subhabrata Mukherjee, Akshat Malu, Balamurali A.R. and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 21st ACM Conference on Information and Knowledge Management CIKM 2012, Hawai, Oct 29 - Nov 2, 2012
  • 2. Social Media Analysis 2         Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black homies......
  • 3. Social Media Analysis 3  Social media sites, like Twitter, generate around 250 million tweets daily        Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black homies......
  • 4. Social Media Analysis 4  Social media sites, like Twitter, generate around 250 million tweets daily  This information content could be leveraged to create applications that have a social as well as an economic value       Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black homies......
  • 5. Social Media Analysis 5  Social media sites, like Twitter, generate around 250 million tweets daily  This information content could be leveraged to create applications that have a social as well as an economic value  Text limit of 140 characters per tweet makes Twitter a noisy medium  Tweets have a poor syntactic and semantic structure  Problems like slangs, ellipses, nonstandard vocabulary etc.    Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black homies......
  • 6. Social Media Analysis 6  Social media sites, like Twitter, generate around 250 million tweets daily  This information content could be leveraged to create applications that have a social as well as an economic value  Text limit of 140 characters per tweet makes Twitter a noisy medium  Tweets have a poor syntactic and semantic structure  Problems like slangs, ellipses, nonstandard vocabulary etc.  Problem is compounded by increasing number of spams in Twitter  Promotional tweets, bot-generated tweets, random links to websites etc.  In fact Twitter contains around 40% tweets as pointless babble Had Hella fun today with the team. Y’all are hilarious! &Yes, i do need more black homies......
  • 7. TwiSent: Multi-Stage System Architecture7 Tweet Fetcher Spam Filter Spell Checker Polarity Detector Pragmatics Handler Dependency Extractor Tweets Opinion
  • 8. Spam Categorization and Features8 1. Number of Words per Tweet 2. Average Word Length 3. Frequency of “?” and “!” 4. Frequency of Numeral Characters 5. Frequency of hashtags 6. Frequency of @users 7. Extent of Capitalization 8. Frequency of the First POS Tag 9. Frequency of Foreign Words 10. Validity of First Word 11. Presence / Absence of links 12. Frequency of POS Tags 13. Strength of Character Elongation 14. Frequency of Slang Words 15. Average Positive and Negative Sentiment of Tweets
  • 9. Algorithm for Spam Filter 9 Input: Build an initial naive bayes classifier NB- C, using the tweet sets M (mixed unlabeled set containing spams and non-spams) and P (labeled non-spam set) 1: Loop while classifier parameters change 2: for each tweet ti ∈ M do 3: Compute Pr[c1 |ti], Pr[c2 |ti] using the current NB //c1 - non-spam class , c2 - spam class 4: Pr[c2 |ti]= 1 - Pr[c1 |ti] 5: Update Pr[fi,k|c1] and Pr[c1] given the probabilistically assigned class for all ti (Pr[c1|ti]). (a new NB-C is being built in the process) 6: end for 7: end loop Pr | = Pr ∏ Pr , ∑ Pr[ ] ∏ ( , | )
  • 12. Spell-Checker Algorithm 12  Heuristically driven to resolve the identified errors with a minimum edit distance based spell checker      
  • 13. Spell-Checker Algorithm 13  Heuristically driven to resolve the identified errors with a minimum edit distance based spell checker  A normalize function takes care of Pragmatics and Number Homophones  Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’    
  • 14. Spell-Checker Algorithm 14  Heuristically driven to resolve the identified errors with a minimum edit distance based spell checker  A normalize function takes care of Pragmatics and Number Homophones  Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’  A vowel_dropped function takes care of the vowel dropping phenomenon   
  • 15. Spell-Checker Algorithm 15  Heuristically driven to resolve the identified errors with a minimum edit distance based spell checker  A normalize function takes care of Pragmatics and Number Homophones  Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’  A vowel_dropped function takes care of the vowel dropping phenomenon  The parameters offset and adv are determined empirically  
  • 16. Spell-Checker Algorithm 16  Heuristically driven to resolve the identified errors with a minimum edit distance based spell checker  A normalize function takes care of Pragmatics and Number Homophones  Replaces happpyyyy with hapy, ‘2’ with ‘to’, ‘8’ with ‘eat’, ‘9’ with ‘ine’  A vowel_dropped function takes care of the vowel dropping phenomenon  The parameters offset and adv are determined empirically  Words are marked during normalization, to preserve their pragmatics  happppyyyyy, normalized to hapy and thereafter spell-corrected to happy, is marked so as to not lose its pragmatic content
  • 17. Spell-Checker Algorithm 17  Input: For string s, let S be the set of words in the lexicon starting with the initial letter of s.  /* Module Spell Checker */  for each word w ∈ S do  w’=vowel_dropped(w)  s’=normalize(s)  /*diff(s,w) gives difference of length between s and w*/  if diff(s’ , w’) < offset then  score[w]=min(edit_distance(s,w),edit_distance(s,  w’), edit_distance(s’ , w))  else  score[w]=max_centinel  end if  end for
  • 18. Spell-Checker Algorithm Contd..18  Sort score of each w in the Lexicon and retain the top m entries in suggestions(s) for the original string s  for each t in suggestions(s) do  edit1=edit_distance(s’ , s)  /*t.replace(char1,char2) replaces all occurrences of char1 in the string t with char2*/  edit2=edit_distance(t.replace( a , e), s’)  edit3=edit_distance(t.replace(e , a), s’)  edit4=edit_distance(t.replace(o , u), s’)  edit5=edit_distance(t.replace(u , o), s’)  edit6=edit_distance(t.replace(i , e), s’)  edit7=edit_distance(t.replace(e, i), s’)  count=overlapping_characters(t, s’)  min_edit=  min(edit1,edit2,edit3,edit4,edit5,edit6,edit7)  if (min_edit ==0 or score[s’] == 0) then  adv=-2 /* for exact match assign advantage score */  else  adv=0  end if  final_score[t]=min_edit+adv+score[w]-count;  end for  return t with minimum final_score;
  • 19. Feature Specific Tweet Analysis I have an ipod and it is a great buy but I'm probably the only person that dislikes the iTunes software. Here the sentiment w.r.t ipod is positive whereas that respect to software is negative
  • 20. Opinion Extraction Hypothesis  “More closely related words come together to express an opinion about a feature”
  • 21. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia”
  • 22. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia”
  • 23. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia”
  • 24. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia”
  • 25. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia” Adjective Modifier
  • 26. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia” Adjective Modifier
  • 27. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia” Adjective Modifier
  • 28. Hypothesis Example  “I want to use Samsung which is a great product but am not so sure about using Nokia”.  Here “great” and “product” are related by an adjective modifier relation, “product” and “Samsung” are related by a relative clause modifier relation. Thus “great” and “Samsung” are transitively related.  Here “great” and “product” are more related to Samsung than they are to Nokia  Hence “great” and “product” come together to express an opinion about the entity “Samsung” than about the entity “Nokia” Adjective Modifier Relative Clause Modifier
  • 29. Example of a Review  I have an ipod and it is a great buy but I'm probably the only person that dislikes the iTunes software.
  • 30. Example of a Review  I have an ipod and it is a great buy but I'm probably the only person that dislikes the iTunes software.
  • 31. Example of a Review  I have an ipod and it is a great buy but I'm probably the only person that dislikes the iTunes software.
  • 32. Example of a Review  I have an ipod and it is a great buy but I'm probably the only person that dislikes the iTunes software.
  • 33. Example of a Review  I have an ipod and it is a great buy but I'm probably the only person that dislikes the iTunes software.
  • 34. Feature Extraction : Domain Info Not Available        
  • 35. Feature Extraction : Domain Info Not Available  Initially, all the Nouns are treated as features and added to the feature list F.       
  • 36. Feature Extraction : Domain Info Not Available  Initially, all the Nouns are treated as features and added to the feature list F.  F = { ipod, buy, person, software }      
  • 37. Feature Extraction : Domain Info Not Available  Initially, all the Nouns are treated as features and added to the feature list F.  F = { ipod, buy, person, software }  Pruning the feature set  Merge 2 features if they are strongly related    
  • 38. Feature Extraction : Domain Info Not Available  Initially, all the Nouns are treated as features and added to the feature list F.  F = { ipod, buy, person, software }  Pruning the feature set  Merge 2 features if they are strongly related  “buy” merged with “ipod”, when target feature = “ipod”,  “person, software” will be ignored.  
  • 39. Feature Extraction : Domain Info Not Available  Initially, all the Nouns are treated as features and added to the feature list F.  F = { ipod, buy, person, software }  Pruning the feature set  Merge 2 features if they are strongly related  “buy” merged with “ipod”, when target feature = “ipod”,  “person, software” will be ignored.  “person” merged with “software”, when target feature = “software”  “ipod, buy” will be ignored.
  • 40. Relations  Direct Neighbor Relation  Capture short range dependencies  Any 2 consecutive words (such that none of them is a StopWord) are directly related  Consider a sentence S and 2 consecutive words .  If , then they are directly related.  Dependency Relation  Capture long range dependencies  Let Dependency_Relation be the list of significant relations.  Any 2 words wi and wj in S are directly related, if s.t.
  • 42. Graph
  • 56. Pragmatics 56  Elongation of a word, repeating alphabets multiple times - Example: happppyyyyyy, goooooood. More weightage is given by repeating them twice     
  • 57. Pragmatics 57  Elongation of a word, repeating alphabets multiple times - Example: happppyyyyyy, goooooood. More weightage is given by repeating them twice  Use of Hashtags - #overrated, #worthawatch. More weightage is given by repeating them thrice    
  • 58. Pragmatics 58  Elongation of a word, repeating alphabets multiple times - Example: happppyyyyyy, goooooood. More weightage is given by repeating them twice  Use of Hashtags - #overrated, #worthawatch. More weightage is given by repeating them thrice  Use of Emoticons -  (happy),  (sad)   
  • 59. Pragmatics 59  Elongation of a word, repeating alphabets multiple times - Example: happppyyyyyy, goooooood. More weightage is given by repeating them twice  Use of Hashtags - #overrated, #worthawatch. More weightage is given by repeating them thrice  Use of Emoticons -  (happy),  (sad)  Use of Capitalization - where words are written in capital letters to express intensity of user sentiments  Full Caps - Example: I HATED that movie. More weightage is given by repeating them thrice  Partial Caps- Example: She is a Loving mom. More weightage is given by repeating them twice
  • 60. Spam Filter Evaluation 60 Tweets Total Tweets Correctly Classified Misclassified Precision (%) Recall (%) All 7007 3815 3192 54.45 55.24 Only spam 1993 1838 155 92.22 92.22 Only non-spam 5014 2259 2755 45.05 - Tweets Total Tweets Correctly Classified Misclassified Precision (%) Recall (%) All 7007 5010 1997 71.50 54.29 Only spam 1993 1604 389 80.48 80.48 Only non-spam 5014 4227 787 84.30 - 2-Class Classification 4-Class Classification
  • 65. TwiSent Evaluation 65 Lexicon-based Classification System 2-class Accuracy Precision/Recall C-Feel-It 50.8 53.16/72.96 TwiSent 68.19 64.92/69.37 Supervised Classification
  • 66. TwiSent Evaluation 66 Lexicon-based Classification System 2-class Accuracy Precision/Recall C-Feel-It 50.8 53.16/72.96 TwiSent 68.19 64.92/69.37 Supervised Classification
  • 67. TwiSent Evaluation 67 Lexicon-based Classification System 2-class Accuracy Precision/Recall C-Feel-It 50.8 53.16/72.96 TwiSent 68.19 64.92/69.37 Supervised Classification Ablation Test
  • 68. TwiSent Evaluation 68 Lexicon-based Classification System 2-class Accuracy Precision/Recall C-Feel-It 50.8 53.16/72.96 TwiSent 68.19 64.92/69.37 Supervised Classification Ablation Test Module Removed Accuracy Statistical Significance Confidence (%) Entity-Specificity 65.14 95 Spell-Checker 64.2 99 Pragmatics Handler 63.51 99 Complete System 66.69 -