SlideShare a Scribd company logo
1 of 25
Intent Classification of Short-text
Social Media
Dec 19 2015
The 8th
IEEE SocialCom-2015
Hemant Purohit
Information Sciences and Technology, George Mason U
Guozhu Dong, Valerie Shalin,
Krishnaprasad Thirunarayan, Amit Sheth
Kno.e.sis, Wright State U
@hemant_pt IEEE SocialCom-2015
Outline
● Intention
● Social Media Short-text
● Intent Classification Problem
● Feature Representation
● Bottom-Up
● Bag of Tokens model
● Top-Down
● Set of Patterns:
● Declarative Knowledge & Social Behavior Knowledge
● Contrast Mining based Patterns
● Experiments & Results
● Limitations & Future Work
22
@hemant_pt IEEE SocialCom-2015
Intention
● Intent: Purpose or aim for an action
● ‘we are tempted to speak of “different senses” of a
word which is clearly not equivocal, we may infer that
we are pretty much in the dark about the character of
the concept which it represents’ (Anscombe 1963, p. 1) [Stanford
Encyclopedia of Philosophy]
● Latent in the utterance
3
@hemant_pt IEEE SocialCom-2015
Social Media Short-text & Intent
Social media text: unstructured, informal language, short
4
DOCUMENT INTENT
Text REDCROSS to 90999 to donate 10$ to help the
victims of hurricane sandy
SEEKING HELP
Anyone know where the nearest #RedCross is? I wanna
give blood today to help the victims of hurricane Sandy
OFFERING HELP
Would like to urge all citizens to make the proper
preparations for Hurricane #Sandy - prep is key - http://t.
co/LyCSprbk has valuable info!
ADVISING
4
@hemant_pt IEEE SocialCom-2015
Short-text Document Intent
● Intent: Aim of action
DOCUMENT INTENT
Text REDCROSS to 90999 to donate 10$ to help the
victims of hurricane sandy
SEEKING HELP
Anyone know where the nearest #RedCross is? I wanna
give blood today to help the victims of hurricane Sandy
OFFERING HELP
Would like to urge all citizens to make the proper
preparations for Hurricane #Sandy - prep is key - http://t.
co/LyCSprbk has valuable info!
ADVISING
5
How to identify relevant intent from ambiguous, unconstrained
natural language text?
Relevant intent ➔ Articulation of organizational tasks
(e.g., Seeking vs. Offering resources)
5
@hemant_pt IEEE SocialCom-2015
Intent Classification: Problem
Formulation
● Given a set of user-generated text documents, identify
existing intents
● Variety of interpretations
● Problem statement: a multi-class classification task
approximate f: S → C , where
C = {C1
, C2
, …, CK
}
is a set of predefined K intent classes, and
S = {m1
, m2
… mN
}
is a set of N short text documents
Focus - Cooperation-assistive intent classes, C= {Seeking, Offering, None}
66
@hemant_pt IEEE SocialCom-2015
Intent Classification: Related Work
TEXT
CLASSIFICATION
TYPE
FOCUS EXAMPLE
Topic predominant
subject matter
sports or entertainment
Sentiment/Emotion/
Opinion
focus on present state
of emotional affairs
negative or positive;
happy emotion
Intent Focus on action, hence,
future state of affairs
offer to help after floods
e.g., I am going to watch the awesome Fast and Furious movie!! #Excited
77
@hemant_pt IEEE SocialCom-2015
Intent Classification: Related Work
DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY
8
Formal text on
Webpages/blogs
(Kröll and Strohmaier 2009, -15;
Raslan et al. 2013, -14)
Knowledge
Acquisition:
via Rules, Clustering
• Lack of large corpora with
proper grammatical structure
• Poor quality text hard to parse
for dependencies
Commercial Reviews,
marketplace
(Hollerit et al. 2013, Chen et al.
2013, Wang et al. 2015, Wu et al.
2011, Ramanand et al. 2010, Carlos
& Yalamanchi 2012)
Classification:
via Rules, Lexical
template based,
Pattern
• More generalized intents (e.
g., ‘help’ broader than ‘sell’)
• Patterns implicit to capture than
for buying/selling
Search Queries
(Broder 2002, Downey et al. 2008,,
Case 2012, Wu et al. 2010,
Strohmaier & Kröll 2012)
User Profiling:
Query Classification
• Lack of large query logs, click
graphs
• Existence of social conversation
8
@hemant_pt IEEE SocialCom-2015
Intent Classification: Challenges
● Unconstrained Natural Language in small space
● Ambiguity in interpretation
● Sparsity of low ‘signal-to-noise’: Imbalanced classes
● 1% signals (Seeking/Offering) in 4.9 million tweets #Sandy
● Hard-to-predict problem
● e.g., commercial intent, F-1 score 65% on Twitter [Hollerit et al. 2013]
@Zuora wants to help @Network4Good with Hurricane Relief. Text SANDY to
80888 & donate $10 to @redcross @AmeriCares & @SalvationArmyUS #help
*Blue: offering intent, *Red: seeking intent
99
@hemant_pt IEEE SocialCom-2015
Intent Classification: Domain & Features
10
Intent
Binary
Crisis Domain:
- [Varga et al. 2013] Problem & Aid (Japanese)
- Purohit et al. 2013, 2014: Seeking & Offering
- Features: N-grams, Rules, Noun-Verb templates, etc.
Commercial Domain:
- [Hollerit et al. 2013] Buy vs. Sell intent
- Features: N-grams, Part-of-Speech
Multiclass
Commercial Domain:
- [Wang et al. 2015] Semi-supervised
- Features: N-grams, Part-of-speech
10
@hemant_pt IEEE SocialCom-2015
TOP-DOWN
Pattern Rules:
Declarative (DK) & Social Behavior (SK)
Knowledge, Contrast Mining (CTK,CPK)
(patterns defined for intent association)
BOTTOM-UP
Bag of N-grams Tokens:
Independent Tokens
(patterns derived from the data)
Our
Hybrid
Approach
Learning
Improves
Expressivity
Increases
11
@hemant_pt IEEE SocialCom-2015
Intent Classification Hybrid:
Multiclass Classifier – Feature Creation
1. (T) Bag of Tokens
Abstraction: due to importance in info sharing [Nagarajan et al. 2010]
- Numeric (e.g., $10) → _NUM_
- Interactions (e.g., RT & @user) → _RT_ , _MENTION_
- Links (e.g., http://bit.ly) → _URL_
N-grams: after stemming and abstraction [Hollerit et al. 2013]
TOKENIZER ( mi
) → { bi-, tri-gram }
12
TOKENIZER(mi ,
min, max)
12
@hemant_pt IEEE SocialCom-2015
Leveraging Declarative Knowledge
● Conceptual Dependency Theory [Schank, 1972]
● Make meaning independent from the actual words in input
● e.g., Class in an Ontology abstracts similar instances
● Verb Lexicon [Hollerit et al. 2013]
● Verb reflects action
● Relevant Levin’s Verb categories [Levin, 1993] , e.g., give, send, etc.
● Syntactic Pattern
● Auxiliary & modals: e.g., ‘be’, ‘do’, ‘could’, etc. [Ramanand et al. 2010]
● Word order: Verb-Subject positions, etc.
1313
@hemant_pt IEEE SocialCom-2015
Leveraging Social Behavior Knowledge
● Conversation indicators often thrown away in Text Mining
14
CATEGORY Hj
Hj
SET
H1 - Determiners (the)
H3 - Subject pronouns (she, he, we, they)
H9 - Dialogue management indicators (thanks, yes, ok, sorry, hi, hello, bye,
anyway, how about, so, what do you
mean, please, {could, would, should, can,
will} followed by pronoun)
H11 - Hedge words (kinda, sorta)
• Feature_Hj
(mi
) = term-frequency ( Hj
-set, mi
)
• Normalized
• Total 14 feature categories
@hemant_pt IEEE SocialCom-2015
Intent Classification Hybrid:
Multiclass Classifier - Feature Creation
2. (DK) Declarative Knowledge Patterns
● Domain expert guidance
● Psycholinguistics syntactic & semantic rules
● Expand by WordNet and Levin Verbs
e.g.,
3. (SK) Social Knowledge Indicators
● Offline conversation indicators
e.g., Hj
= Dialogue Management, Hj-set = {Thanks, anyway,..}
15
Feature_Pj
(mi
) = 1 if Pj
exists in mi
, else 0
Feature_Hj
(mi
) = term-frequency ( Hj
-set, mi
)
@hemant_pt IEEE SocialCom-2015
Intent Classification Hybrid:
Multiclass Classifier - Feature Creation
4. (CTK) Contrast Knowledge Patterns
INPUT: corpus {mi
} cleaned and abstracted, min. support, X
For each class Cj
● Find contrasting pattern using sequential pattern mining
OUTPUT: contrast patterns set {P} for each class Cj
5. (CPK) Contrast Patterns: on Part-of-Speech tags of {mi
}
16
e.g., unique sequential patterns:
SEEKING: help .* victim .* _url_ .*
OFFERING: anyon .* know .* cloth .*
@hemant_pt IEEE SocialCom-2015
Contrast Mining based Patterns
Finding CTK (CPK): Contrast Knowledge Patterns
For each class Cj
1. Tokenize the cleaned, abstracted text of {mi
}
2. Mine Sequential Patterns, {P}: using SPADE Algorithm
3. Reduce to minimal sequences {P}
4. Compute growth rate & contrast strength for P with all other Ck
5. Top-K ranked {P} by contrast strength
OUTPUT: contrast patterns set {P} for each class Cj
17
gr(P,Cj,Ck) = support (P,Cj) / support (P,Ck) .. (1)
Contrast-Growth (P,Cj) = 1/(|Cj| -1) ΣCk, k=/=j
gr(P,Cj,Ck)/ (1 + gr(P,Cj,Ck)) ..(2)
Sparse-Contrast-Strength(P,Cj) = support(P,Cj)*Contrast-Growth(P,Cj) .. (3)
@hemant_pt IEEE SocialCom-2015
CORPUS
Set of
short text
documents,
S
FEATURES
Knowledge-driven
features
XT
,
y
M_1
M_2
M_K
.
.
.
Subset Xj
T
⊂ S such that, Xj
T
includes
all the labeled instances of class Cj
for
model M_j
Binarization Frameworks for Multiclass
Classifier: 1 vs. All (OVA)
P(c2
)
P(c1
)
X1
T
, y1
X2
T
, y2
XK
T
, yK
P(cK
)
18
(In 1 vs. 1 (OVO) framework: K*(K-1)/2 classifiers, for each Cj,Ck pair)
@hemant_pt IEEE SocialCom-2015
Intent Classification Hybrid:
Multiclass Classifier - Experiments
● Datasets
● Dataset-1: Hurricane Sandy, Oct 27 – Nov 7, 2012
● Dataset-2: Philippines Typhoon, Nov 7 – Nov 17, 2013
● Parameters
● Base Learner M_j: Random Forest, 10 trees with 100 features
● bi-, tri-gram for (T)
● K=100% & min. support 10% for CTK, 50% for CPK
19
@hemant_pt
Intent Classification:
Multiclass Classifier – Results
20
Avg. F-1 Score
(10-fold CV)
Frameworks:
Gain 7%, p < 0.05
Dataset-1 (Hurricane Sandy, 2012)
(Declarative)
(Social)
(Contrast)
T,DK,SK,CTK,CPK
T,CTK,CPK
@hemant_pt
Intent Classification:
Multiclass Classifier - Results
21
Frameworks:
Gain 6%, p < 0.05
Dataset-2 (Philippines Typhoon, 2013)
(Declarative)
(Social)
(Contrast)
Avg. F-1 Score
(10-fold CV)
T,DK,SK,CTK,CPK
T,CTK,CPK
@hemant_pt IEEE SocialCom-2015
Lessons
1. Top-down & Bottom-up hybrid approach improves data
representation for learning (complementary) intent classes
- Top 1% discriminative features contained 50% knowledge driven
2. Offline theoretic social conversation (SK) features (the,
thanks, etc.), often removed for text mining are valuable for
intent mining.
3. There is a varying effect of knowledge types (SK vs. DK vs.
CTK/CPK) in different types of real world event datasets
➢ Culturally-sensitive psycholinguistics knowledge in future
22
@hemant_pt IEEE SocialCom-2015
Limitations & Future Work Directions
-Non-cooperation assistive intent classes not considered
-Temporal drift of intent not considered
-Possibility for Multilabel intent classes with instances
-Mining actor-level intent beyond document level
23
@hemant_pt IEEE SocialCom-2015
Conclusion
A hybrid approach of interplaying features from
top-down representation via patterns using prior knowledge
of psycholinguistics, social behavior, & contrast mining
&
bottom-up representation via bag-of-tokens model
improves Intent Classification of short-text on social media.
24
@hemant_pt IEEE SocialCom-2015
25
TWITTER: @hemant_pt
MAIL: hpurohit@gmu.edu
Acknowledgement: Respective image sources, and
Questions?
Grant IIS-1111182

More Related Content

What's hot

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 

What's hot (20)

INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
What is Multithreading In Python | Python Multithreading Tutorial | Edureka
What is Multithreading In Python | Python Multithreading Tutorial | EdurekaWhat is Multithreading In Python | Python Multithreading Tutorial | Edureka
What is Multithreading In Python | Python Multithreading Tutorial | Edureka
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
BERT
BERTBERT
BERT
 

Viewers also liked

Viewers also liked (10)

Currency trend in 2017
Currency trend in 2017Currency trend in 2017
Currency trend in 2017
 
Entree Resources Corporate Presentation - Sept 2017
Entree Resources Corporate Presentation - Sept 2017Entree Resources Corporate Presentation - Sept 2017
Entree Resources Corporate Presentation - Sept 2017
 
Agenda, Sep 11, 2017
Agenda, Sep 11, 2017Agenda, Sep 11, 2017
Agenda, Sep 11, 2017
 
18.03.2013 Mongolia and Mining: The policy evolution what's the next? Dr. Ch....
18.03.2013 Mongolia and Mining: The policy evolution what's the next? Dr. Ch....18.03.2013 Mongolia and Mining: The policy evolution what's the next? Dr. Ch....
18.03.2013 Mongolia and Mining: The policy evolution what's the next? Dr. Ch....
 
30.10.2013 Mongolia’s minerals future and development, Otgochuluu Ch
30.10.2013 Mongolia’s minerals future and development, Otgochuluu Ch30.10.2013 Mongolia’s minerals future and development, Otgochuluu Ch
30.10.2013 Mongolia’s minerals future and development, Otgochuluu Ch
 
Mongolian e-Government Introduction by Tumennast KAIST ITTP 2014
Mongolian e-Government Introduction by Tumennast KAIST ITTP 2014Mongolian e-Government Introduction by Tumennast KAIST ITTP 2014
Mongolian e-Government Introduction by Tumennast KAIST ITTP 2014
 
Asian Development Outlook 2017 update
Asian Development Outlook 2017 updateAsian Development Outlook 2017 update
Asian Development Outlook 2017 update
 
Entree Resources Corporate Presentation - Oct 2017
Entree Resources Corporate Presentation - Oct 2017Entree Resources Corporate Presentation - Oct 2017
Entree Resources Corporate Presentation - Oct 2017
 
Newmont november 2017 investor presentation final
Newmont november 2017 investor presentation finalNewmont november 2017 investor presentation final
Newmont november 2017 investor presentation final
 
Integrated Approaches: Integration of SDGs into national development policies...
Integrated Approaches: Integration of SDGs into national development policies...Integrated Approaches: Integration of SDGs into national development policies...
Integrated Approaches: Integration of SDGs into national development policies...
 

Similar to IEEE SocialCom 2015: Intent Classification of Social Media Text

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Artificial Intelligence Institute at UofSC
 
From Silver Bullets to First Principles: Effectively Leveraging Technology in...
From Silver Bullets to First Principles: Effectively Leveraging Technology in...From Silver Bullets to First Principles: Effectively Leveraging Technology in...
From Silver Bullets to First Principles: Effectively Leveraging Technology in...
Peter Doolittle
 
The Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David LowThe Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David Low
Databricks
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 
Smithgirard(2)
Smithgirard(2)Smithgirard(2)
Smithgirard(2)
dmloch
 
Involving users in the design of apps for the writing processes. An experimen...
Involving users in the design of apps for the writing processes.An experimen...Involving users in the design of apps for the writing processes.An experimen...
Involving users in the design of apps for the writing processes. An experimen...
Maria Ranieri
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language Processing
Adriana Wilson
 

Similar to IEEE SocialCom 2015: Intent Classification of Social Media Text (20)

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
 
From Silver Bullets to First Principles: Effectively Leveraging Technology in...
From Silver Bullets to First Principles: Effectively Leveraging Technology in...From Silver Bullets to First Principles: Effectively Leveraging Technology in...
From Silver Bullets to First Principles: Effectively Leveraging Technology in...
 
The Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David LowThe Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David Low
 
School Design Project
School Design ProjectSchool Design Project
School Design Project
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
The Editor as EAP Instructor
The Editor as EAP InstructorThe Editor as EAP Instructor
The Editor as EAP Instructor
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
ChatGPT in academic settings H2.de
ChatGPT in academic settings H2.deChatGPT in academic settings H2.de
ChatGPT in academic settings H2.de
 
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxNeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
 
A Better Way to Design & Build Immersive E Learning
A Better Way to Design & Build Immersive E LearningA Better Way to Design & Build Immersive E Learning
A Better Way to Design & Build Immersive E Learning
 
Pattern Languages for Public Problem Solving: Seven Seeds for Theory and Prac...
Pattern Languages for Public Problem Solving: Seven Seeds for Theory and Prac...Pattern Languages for Public Problem Solving: Seven Seeds for Theory and Prac...
Pattern Languages for Public Problem Solving: Seven Seeds for Theory and Prac...
 
Leb08talksept17
Leb08talksept17Leb08talksept17
Leb08talksept17
 
Project report
Project reportProject report
Project report
 
Generative Artificial Intelligence 3/14/2023 Johannes Schunter Head of Knowle...
Generative Artificial Intelligence 3/14/2023 Johannes Schunter Head of Knowle...Generative Artificial Intelligence 3/14/2023 Johannes Schunter Head of Knowle...
Generative Artificial Intelligence 3/14/2023 Johannes Schunter Head of Knowle...
 
Invisible structures of technical writing
Invisible structures of technical writingInvisible structures of technical writing
Invisible structures of technical writing
 
Smithgirard(2)
Smithgirard(2)Smithgirard(2)
Smithgirard(2)
 
Involving users in the design of apps for the writing processes. An experimen...
Involving users in the design of apps for the writing processes.An experimen...Involving users in the design of apps for the writing processes.An experimen...
Involving users in the design of apps for the writing processes. An experimen...
 
A Langauge of Patterns for Mathematical Learning
A Langauge of Patterns for Mathematical LearningA Langauge of Patterns for Mathematical Learning
A Langauge of Patterns for Mathematical Learning
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language Processing
 

More from Hemant Purohit

More from Hemant Purohit (11)

Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
 
Detect Policy-affecting Intent in Twitter Conversations for Rape and Sexual A...
Detect Policy-affecting Intent in Twitter Conversations for Rape and Sexual A...Detect Policy-affecting Intent in Twitter Conversations for Rape and Sexual A...
Detect Policy-affecting Intent in Twitter Conversations for Rape and Sexual A...
 
Workload-bound Ranking of Alerts for Emergency Operation Centers - Web Intell...
Workload-bound Ranking of Alerts for Emergency Operation Centers - Web Intell...Workload-bound Ranking of Alerts for Emergency Operation Centers - Web Intell...
Workload-bound Ranking of Alerts for Emergency Operation Centers - Web Intell...
 
Automatically Rank Social Media Requests for Emergency Services using Service...
Automatically Rank Social Media Requests for Emergency Services using Service...Automatically Rank Social Media Requests for Emergency Services using Service...
Automatically Rank Social Media Requests for Emergency Services using Service...
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
 
Uncertain Concept Graph for Social Web Summarization during Emergencies - CPS18
Uncertain Concept Graph for Social Web Summarization during Emergencies - CPS18Uncertain Concept Graph for Social Web Summarization during Emergencies - CPS18
Uncertain Concept Graph for Social Web Summarization during Emergencies - CPS18
 
User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...
 
Public Health Crisis Analytics for Gender Violence
Public Health Crisis Analytics for Gender ViolencePublic Health Crisis Analytics for Gender Violence
Public Health Crisis Analytics for Gender Violence
 
Humanitarian Informatics Approach for Cooperation between Citizens and Organi...
Humanitarian Informatics Approach for Cooperation between Citizens and Organi...Humanitarian Informatics Approach for Cooperation between Citizens and Organi...
Humanitarian Informatics Approach for Cooperation between Citizens and Organi...
 
Lessons Learned from PhD Process Experience
Lessons Learned from PhD Process ExperienceLessons Learned from PhD Process Experience
Lessons Learned from PhD Process Experience
 
ICICT-15 keynote: Big Data Innovation for Social Impact, Hemant Purohit
ICICT-15 keynote: Big Data Innovation for Social Impact, Hemant PurohitICICT-15 keynote: Big Data Innovation for Social Impact, Hemant Purohit
ICICT-15 keynote: Big Data Innovation for Social Impact, Hemant Purohit
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

IEEE SocialCom 2015: Intent Classification of Social Media Text

  • 1. Intent Classification of Short-text Social Media Dec 19 2015 The 8th IEEE SocialCom-2015 Hemant Purohit Information Sciences and Technology, George Mason U Guozhu Dong, Valerie Shalin, Krishnaprasad Thirunarayan, Amit Sheth Kno.e.sis, Wright State U
  • 2. @hemant_pt IEEE SocialCom-2015 Outline ● Intention ● Social Media Short-text ● Intent Classification Problem ● Feature Representation ● Bottom-Up ● Bag of Tokens model ● Top-Down ● Set of Patterns: ● Declarative Knowledge & Social Behavior Knowledge ● Contrast Mining based Patterns ● Experiments & Results ● Limitations & Future Work 22
  • 3. @hemant_pt IEEE SocialCom-2015 Intention ● Intent: Purpose or aim for an action ● ‘we are tempted to speak of “different senses” of a word which is clearly not equivocal, we may infer that we are pretty much in the dark about the character of the concept which it represents’ (Anscombe 1963, p. 1) [Stanford Encyclopedia of Philosophy] ● Latent in the utterance 3
  • 4. @hemant_pt IEEE SocialCom-2015 Social Media Short-text & Intent Social media text: unstructured, informal language, short 4 DOCUMENT INTENT Text REDCROSS to 90999 to donate 10$ to help the victims of hurricane sandy SEEKING HELP Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy OFFERING HELP Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http://t. co/LyCSprbk has valuable info! ADVISING 4
  • 5. @hemant_pt IEEE SocialCom-2015 Short-text Document Intent ● Intent: Aim of action DOCUMENT INTENT Text REDCROSS to 90999 to donate 10$ to help the victims of hurricane sandy SEEKING HELP Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy OFFERING HELP Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http://t. co/LyCSprbk has valuable info! ADVISING 5 How to identify relevant intent from ambiguous, unconstrained natural language text? Relevant intent ➔ Articulation of organizational tasks (e.g., Seeking vs. Offering resources) 5
  • 6. @hemant_pt IEEE SocialCom-2015 Intent Classification: Problem Formulation ● Given a set of user-generated text documents, identify existing intents ● Variety of interpretations ● Problem statement: a multi-class classification task approximate f: S → C , where C = {C1 , C2 , …, CK } is a set of predefined K intent classes, and S = {m1 , m2 … mN } is a set of N short text documents Focus - Cooperation-assistive intent classes, C= {Seeking, Offering, None} 66
  • 7. @hemant_pt IEEE SocialCom-2015 Intent Classification: Related Work TEXT CLASSIFICATION TYPE FOCUS EXAMPLE Topic predominant subject matter sports or entertainment Sentiment/Emotion/ Opinion focus on present state of emotional affairs negative or positive; happy emotion Intent Focus on action, hence, future state of affairs offer to help after floods e.g., I am going to watch the awesome Fast and Furious movie!! #Excited 77
  • 8. @hemant_pt IEEE SocialCom-2015 Intent Classification: Related Work DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY 8 Formal text on Webpages/blogs (Kröll and Strohmaier 2009, -15; Raslan et al. 2013, -14) Knowledge Acquisition: via Rules, Clustering • Lack of large corpora with proper grammatical structure • Poor quality text hard to parse for dependencies Commercial Reviews, marketplace (Hollerit et al. 2013, Chen et al. 2013, Wang et al. 2015, Wu et al. 2011, Ramanand et al. 2010, Carlos & Yalamanchi 2012) Classification: via Rules, Lexical template based, Pattern • More generalized intents (e. g., ‘help’ broader than ‘sell’) • Patterns implicit to capture than for buying/selling Search Queries (Broder 2002, Downey et al. 2008,, Case 2012, Wu et al. 2010, Strohmaier & Kröll 2012) User Profiling: Query Classification • Lack of large query logs, click graphs • Existence of social conversation 8
  • 9. @hemant_pt IEEE SocialCom-2015 Intent Classification: Challenges ● Unconstrained Natural Language in small space ● Ambiguity in interpretation ● Sparsity of low ‘signal-to-noise’: Imbalanced classes ● 1% signals (Seeking/Offering) in 4.9 million tweets #Sandy ● Hard-to-predict problem ● e.g., commercial intent, F-1 score 65% on Twitter [Hollerit et al. 2013] @Zuora wants to help @Network4Good with Hurricane Relief. Text SANDY to 80888 & donate $10 to @redcross @AmeriCares & @SalvationArmyUS #help *Blue: offering intent, *Red: seeking intent 99
  • 10. @hemant_pt IEEE SocialCom-2015 Intent Classification: Domain & Features 10 Intent Binary Crisis Domain: - [Varga et al. 2013] Problem & Aid (Japanese) - Purohit et al. 2013, 2014: Seeking & Offering - Features: N-grams, Rules, Noun-Verb templates, etc. Commercial Domain: - [Hollerit et al. 2013] Buy vs. Sell intent - Features: N-grams, Part-of-Speech Multiclass Commercial Domain: - [Wang et al. 2015] Semi-supervised - Features: N-grams, Part-of-speech 10
  • 11. @hemant_pt IEEE SocialCom-2015 TOP-DOWN Pattern Rules: Declarative (DK) & Social Behavior (SK) Knowledge, Contrast Mining (CTK,CPK) (patterns defined for intent association) BOTTOM-UP Bag of N-grams Tokens: Independent Tokens (patterns derived from the data) Our Hybrid Approach Learning Improves Expressivity Increases 11
  • 12. @hemant_pt IEEE SocialCom-2015 Intent Classification Hybrid: Multiclass Classifier – Feature Creation 1. (T) Bag of Tokens Abstraction: due to importance in info sharing [Nagarajan et al. 2010] - Numeric (e.g., $10) → _NUM_ - Interactions (e.g., RT & @user) → _RT_ , _MENTION_ - Links (e.g., http://bit.ly) → _URL_ N-grams: after stemming and abstraction [Hollerit et al. 2013] TOKENIZER ( mi ) → { bi-, tri-gram } 12 TOKENIZER(mi , min, max) 12
  • 13. @hemant_pt IEEE SocialCom-2015 Leveraging Declarative Knowledge ● Conceptual Dependency Theory [Schank, 1972] ● Make meaning independent from the actual words in input ● e.g., Class in an Ontology abstracts similar instances ● Verb Lexicon [Hollerit et al. 2013] ● Verb reflects action ● Relevant Levin’s Verb categories [Levin, 1993] , e.g., give, send, etc. ● Syntactic Pattern ● Auxiliary & modals: e.g., ‘be’, ‘do’, ‘could’, etc. [Ramanand et al. 2010] ● Word order: Verb-Subject positions, etc. 1313
  • 14. @hemant_pt IEEE SocialCom-2015 Leveraging Social Behavior Knowledge ● Conversation indicators often thrown away in Text Mining 14 CATEGORY Hj Hj SET H1 - Determiners (the) H3 - Subject pronouns (she, he, we, they) H9 - Dialogue management indicators (thanks, yes, ok, sorry, hi, hello, bye, anyway, how about, so, what do you mean, please, {could, would, should, can, will} followed by pronoun) H11 - Hedge words (kinda, sorta) • Feature_Hj (mi ) = term-frequency ( Hj -set, mi ) • Normalized • Total 14 feature categories
  • 15. @hemant_pt IEEE SocialCom-2015 Intent Classification Hybrid: Multiclass Classifier - Feature Creation 2. (DK) Declarative Knowledge Patterns ● Domain expert guidance ● Psycholinguistics syntactic & semantic rules ● Expand by WordNet and Levin Verbs e.g., 3. (SK) Social Knowledge Indicators ● Offline conversation indicators e.g., Hj = Dialogue Management, Hj-set = {Thanks, anyway,..} 15 Feature_Pj (mi ) = 1 if Pj exists in mi , else 0 Feature_Hj (mi ) = term-frequency ( Hj -set, mi )
  • 16. @hemant_pt IEEE SocialCom-2015 Intent Classification Hybrid: Multiclass Classifier - Feature Creation 4. (CTK) Contrast Knowledge Patterns INPUT: corpus {mi } cleaned and abstracted, min. support, X For each class Cj ● Find contrasting pattern using sequential pattern mining OUTPUT: contrast patterns set {P} for each class Cj 5. (CPK) Contrast Patterns: on Part-of-Speech tags of {mi } 16 e.g., unique sequential patterns: SEEKING: help .* victim .* _url_ .* OFFERING: anyon .* know .* cloth .*
  • 17. @hemant_pt IEEE SocialCom-2015 Contrast Mining based Patterns Finding CTK (CPK): Contrast Knowledge Patterns For each class Cj 1. Tokenize the cleaned, abstracted text of {mi } 2. Mine Sequential Patterns, {P}: using SPADE Algorithm 3. Reduce to minimal sequences {P} 4. Compute growth rate & contrast strength for P with all other Ck 5. Top-K ranked {P} by contrast strength OUTPUT: contrast patterns set {P} for each class Cj 17 gr(P,Cj,Ck) = support (P,Cj) / support (P,Ck) .. (1) Contrast-Growth (P,Cj) = 1/(|Cj| -1) ΣCk, k=/=j gr(P,Cj,Ck)/ (1 + gr(P,Cj,Ck)) ..(2) Sparse-Contrast-Strength(P,Cj) = support(P,Cj)*Contrast-Growth(P,Cj) .. (3)
  • 18. @hemant_pt IEEE SocialCom-2015 CORPUS Set of short text documents, S FEATURES Knowledge-driven features XT , y M_1 M_2 M_K . . . Subset Xj T ⊂ S such that, Xj T includes all the labeled instances of class Cj for model M_j Binarization Frameworks for Multiclass Classifier: 1 vs. All (OVA) P(c2 ) P(c1 ) X1 T , y1 X2 T , y2 XK T , yK P(cK ) 18 (In 1 vs. 1 (OVO) framework: K*(K-1)/2 classifiers, for each Cj,Ck pair)
  • 19. @hemant_pt IEEE SocialCom-2015 Intent Classification Hybrid: Multiclass Classifier - Experiments ● Datasets ● Dataset-1: Hurricane Sandy, Oct 27 – Nov 7, 2012 ● Dataset-2: Philippines Typhoon, Nov 7 – Nov 17, 2013 ● Parameters ● Base Learner M_j: Random Forest, 10 trees with 100 features ● bi-, tri-gram for (T) ● K=100% & min. support 10% for CTK, 50% for CPK 19
  • 20. @hemant_pt Intent Classification: Multiclass Classifier – Results 20 Avg. F-1 Score (10-fold CV) Frameworks: Gain 7%, p < 0.05 Dataset-1 (Hurricane Sandy, 2012) (Declarative) (Social) (Contrast) T,DK,SK,CTK,CPK T,CTK,CPK
  • 21. @hemant_pt Intent Classification: Multiclass Classifier - Results 21 Frameworks: Gain 6%, p < 0.05 Dataset-2 (Philippines Typhoon, 2013) (Declarative) (Social) (Contrast) Avg. F-1 Score (10-fold CV) T,DK,SK,CTK,CPK T,CTK,CPK
  • 22. @hemant_pt IEEE SocialCom-2015 Lessons 1. Top-down & Bottom-up hybrid approach improves data representation for learning (complementary) intent classes - Top 1% discriminative features contained 50% knowledge driven 2. Offline theoretic social conversation (SK) features (the, thanks, etc.), often removed for text mining are valuable for intent mining. 3. There is a varying effect of knowledge types (SK vs. DK vs. CTK/CPK) in different types of real world event datasets ➢ Culturally-sensitive psycholinguistics knowledge in future 22
  • 23. @hemant_pt IEEE SocialCom-2015 Limitations & Future Work Directions -Non-cooperation assistive intent classes not considered -Temporal drift of intent not considered -Possibility for Multilabel intent classes with instances -Mining actor-level intent beyond document level 23
  • 24. @hemant_pt IEEE SocialCom-2015 Conclusion A hybrid approach of interplaying features from top-down representation via patterns using prior knowledge of psycholinguistics, social behavior, & contrast mining & bottom-up representation via bag-of-tokens model improves Intent Classification of short-text on social media. 24
  • 25. @hemant_pt IEEE SocialCom-2015 25 TWITTER: @hemant_pt MAIL: hpurohit@gmu.edu Acknowledgement: Respective image sources, and Questions? Grant IIS-1111182