Mining Citizen Sensor Communities to Improve
Cooperation with Organizational Actors
June 23 2015
PhD Defense
Hemant Purohit (Advisor: Prof. Amit Sheth)	
  
Kno.e.sis, Dept. of CSE, Wright State University, USA
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Contributions
—  Problem 1. Conversation Classification using Offline Theories
—  Problem 2. Intent Classification
—  Problem 3. Engagement Modeling
—  Applications
—  Limitations & Future Work
2
@hemant_pt
Citizen Sensors: Access to Human
Observations & Interactions
Uni-directional communication
(TO people)
Unstructured, Unconstrained Language Data
•  Ambiguity
•  Sparsity
•  Diversity
•  Scalability
Bi-directional
(BY people, TO people)
Web 2.0
media
3
@hemant_pt
Goal: Data to Decision Making
Organizational Decision Making
Noisy Citizen Sensor data
4
SOCIAL SCIENCE
•  Experts on Organizations
•  Small-scale Data
COMPUTER SCIENCE
•  Experts on Mining
•  Large-scale data
Scope of My
Research
@hemant_pt
1.  No Structured Roles
2.  No Defined Tasks
ü  But “GENERATE”
Massive Data
1.  Structured Roles
2.  Defined Tasks
ü  COLLECT Data
ü  Process, & Make Decisions
ORGANIZATIONS	
  
Sure!
How to help?
CITIZEN	
  SENSOR	
  COMMUNITIES	
  
5
COOPERATIVE
SYSTEM
Can you
help us?
@hemant_pt
Computer-Supported Cooperative
Work (CSCW) Matrix
6
[Johansen
1988,
Baecker
1995]
TIME
PLACE
@hemant_pt
Articulation
Challenges
(Malone & Crowston 1990;
Schmidt & Bannon 1992)
ENGAGEMENT MODELING INTENT MINING
COOPERATIVE
SYSTEM
DATA
PROBLEM
DESIGN
PROBLEM
7
ORGANIZATIONS	
   CITIZEN	
  SENSOR	
  COMMUNITIES	
  
Awareness
Q1. Who to
engage
first?
Org. Actor
Q2. What are
resource needs &
availabilities?
Org. Actor
@hemant_pt
Research Questions
—  Can general theories of offline conversation be
applied in the online context?
—  Can we model intentions to inform organizational
tasks using knowledge-guided features?
—  Can we find reliable groups to engage by modeling
collective group divergence using content-based
measure?
8
@hemant_pt
Thesis: Statement
Prior knowledge, and
interplay of features of users, their content, and network
efficiently model
Intent & Engagement
for cooperation of citizen sensor communities.
Scope of Concepts
•  Intent: aim of action, e.g., offering help
•  Engagement: involvement in activity, e.g., participating in discussion
9
@hemant_pt
Contributions
1.  Operationalized computing in cooperative system design
—  by accommodating articulation in Intent Mining, and
—  enriching awareness by Engagement Modeling
2.  Improved computation of online social data
—  by incorporating features from offline social theoretical knowledge
3.  Improved performance of intent classification
—  by fusing top-down & bottom-up data representations
4.  Improved explanation of group engagement
—  by modeling content divergence to complement existing structural measures
10
@hemant_pt
Data: Scope
—  Social Platform: Twitter
—  Important bridge between citizens & organizations
—  Characteristics
—  Users: follow/subscribe
—  Content: status updates (140 chars max)
—  Network: directed
—  Platform conversation functions
—  Reply
—  Retweet
—  Mention
11
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: tackle via Engagement Modeling
—  Articulation: tackle via Intent Mining
—  Contributions
—  Problem 1. Conversation Classification using Offline Theories
—  Problem 2. Intent Classification
—  Problem 3. Engagement Modeling
—  Applications
—  Limitations & Future Work
12
@hemant_pt
User1. Analyzing #Conversations on Twitter. Using platform provided
functions #REPLY, #RT, and #Mention.
..
…
……..
User2. I kinda feel one might need more than just the platform fn -- @User1 u
can think #Psycholinguistics, dude!
Problem 1. Conversation Classification
—  Function of Reply, Retweet, Mention reflect conversation
13
R1. Can general theories of conversation be applied in the online context?
@hemant_pt
Problem 1. Conversation Classification
—  Function of Reply, Retweet, Mention reflect conversation
—  Task: Given a set S of messages mi, Classify a sample {mi}
for {RP, None}, {RT, None}, {MN, None} , where
—  Ground-truth corpuses
—  RP = { mi | has_Reply_function (mi) = True }
—  RT = { mi | has_Retweet_function (mi) = True }
—  MN = { mi | has_Mention_function (mi) = True }
—  None = S – {RP, RT, MN}
—  Sample {mi} size = 3, based on average Reply conversation size
14
@hemant_pt
Conversation Classification: Offline
Theories
—  Psycholinguistics Indicators [Clark & Gibbs, 1986, Chafe 1987, etc.]
—  Determiners (‘the’ vs. ‘a/an’)
—  Dialogue Management (e.g., ‘thanks’, ’anyway’), etc.
—  Drawback
—  Offline analysis focused on positive conversation instances
—  Hypotheses
—  Offline theoretic features are discriminative
—  Such features correlate with information density
15
@hemant_pt
Conversation Classification: Feature
Examples
16
CATEGORY Hj Hj SET
H1 - Determiners (the)
H3 - Subject pronouns (she, he, we, they)
H9 - Dialogue management indicators (thanks, yes, ok, sorry, hi, hello, bye,
anyway, how about, so, what do you
mean, please, {could, would, should,
can, will} followed by pronoun)
H11 - Hedge words (kinda, sorta)
•  Feature_Hj (mi) = term-frequency ( Hj-set, mi )
•  Normalized
•  Total 14 feature categories
@hemant_pt
Conversation Classification: Results
—  Dataset
—  Tweets from 3 Disasters, and 3 Non-Disaster events
—  Varying set size (3.8K – 609K), time periods
—  Classifier:
—  Decision Tree
—  Evaluation: 10-fold Cross Validation
—  Accuracy: 62% - 78% [Lowest for {Mention,None} ]
—  AUC range: 0.63 - 0.84
17	
  Purohit,	
  Hampton,	
  Shalin,	
  Sheth	
  &	
  Flach.	
  In	
  Journal	
  of	
  Computers	
  in	
  Human	
  Behavior,	
  2013
@hemant_pt
Conversation Classification:
Discriminative Features
—  Consistent top features across classifiers
—  Pronouns (e.g., you, he)
—  Dialogue management (e.g., thanks)
—  Determiners (e.g., the)
—  Word counts
—  Positively correlated with RP, RT, MN
—  Correlation Coefficient up to 0.69
18
@hemant_pt
Conversation Classification:
Psycholinguistic Analysis
—  LIWC: Tool for deeper content analysis [Pennebaker, 2001]
—  Gives a measure per psychological category
—  Categories of interest
—  Social Interaction
—  Sensed Experience
—  Communication
—  Analyzed output sets in confusion matrices
Ø  Higher values for positive classified conversation
Ø suggests higher information for cooperative intent
19	
  Purohit,	
  Hampton,	
  Shalin,	
  Sheth	
  &	
  Flach.	
  In	
  Journal	
  of	
  Computers	
  in	
  Human	
  Behavior,	
  2013
True
Positive
False
Negative
False
Positive
True
Negative
@hemant_pt
Conversation Classification:
Lessons
1.  Offline theoretic features of conversations exist in the
online environment
Ø  Can be applied for computing social data
2.  Such features correlate with information density in content
- Reflection of conversation for an intent
20
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: tackle via Engagement Modeling
—  Articulation: tackle via Intent Mining
—  Contributions
—  Problem 1. Conversation Classification using Offline Theories
—  Problem 2. Intent Classification
—  Problem 3. Engagement Modeling
—  Applications
—  Limitations & Future Work
21
@hemant_pt
Thesis: Statement
Prior knowledge, and
interplay of features of users, their content, and network
efficiently model
Intent & Engagement
for cooperation of citizen sensor communities.
22
@hemant_pt
Short-text Document Intent
—  Intent: Aim of action
DOCUMENT	
   INTENT
Text	
  REDCROSS	
  to	
  90999	
  to	
  donate	
  10$	
  to	
  help	
  the	
  victims	
  of	
  
hurricane	
  sandy
SEEKING HELP
Anyone know where the nearest #RedCross is? I wanna
give blood today to help the victims of hurricane Sandy
OFFERING HELP	
  
Would like to urge all citizens to make the proper
preparations for Hurricane #Sandy - prep is key - http://
t.co/LyCSprbk has valuable info!
ADVISING	
  
23
@hemant_pt
Short-text Document Intent
—  Intent: Aim of action
DOCUMENT	
   INTENT
Text	
  REDCROSS	
  to	
  90999	
  to	
  donate	
  10$	
  to	
  help	
  the	
  victims	
  of	
  
hurricane	
  sandy
SEEKING HELP
Anyone know where the nearest #RedCross is? I wanna
give blood today to help the victims of hurricane Sandy
OFFERING HELP	
  
Would like to urge all citizens to make the proper
preparations for Hurricane #Sandy - prep is key - http://
t.co/LyCSprbk has valuable info!
ADVISING	
  
24
How to identify relevant intent from ambiguous, unconstrained
natural language text?
Relevant intent è Articulation of organizational tasks
(e.g., Seeking vs. Offering resources)
@hemant_pt
Intent Classification: Problem
Formulation
—  Given a set of user-generated text documents, identify
existing intents
—  Variety of interpretations
—  Problem statement: a multi-class classification task
approximate f: S ! C , where
C = {c1, c2 … cK}
is a set of predefined K intent classes, and
S = {m1, m2 … mN}
is a set of N short text documents
Focus - Cooperation-assistive intent classes, C= {Seeking, Offering, None}
25
@hemant_pt
Intent Classification: Related Work
TEXT CLASSIFICATION
TYPE
FOCUS EXAMPLE
Topic predominant
subject matter
sports or entertainment
Sentiment/Emotion/
Opinion
focus on present state
of emotional affairs
negative or positive;
happy emotion
Intent Focus on action, hence,
future state of affairs
offer to help after floods
e.g., I am going to watch the awesome Fast and Furious movie!! #Excited
26
@hemant_pt
Intent Classification: Related Work
DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY
27
Formal text on
Webpages/blogs
(Kröll and Strohmaier 2009, -15;
Raslan et al. 2013, -14)
Knowledge
Acquisition:
via Rules, Clustering
•  Lack of large corpora with
proper grammatical structure
•  Poor quality text hard to parse
for dependencies
Commercial Reviews,
marketplace
(Hollerit et al. 2013, Wu et al. 2011,
Ramanand et al. 2010, Carlos &
Yalamanchi 2012, Nagarajan et al.
2009)
Classification:
via Rules, Lexical
template based,
Pattern
•  More generalized intents
(e.g., ‘help’ broader than ‘sell’)
•  Patterns implicit to capture than
for buying/selling
Search Queries
(Broder 2002, Downey et al. 2008,,
Case 2012, Wu et al. 2010,
Strohmaier & Kröll 2012)
User Profiling:
Query Classification
•  Lack of large query logs, click
graphs
•  Existence of social conversation
@hemant_pt
Intent Classification: Challenges
—  Unconstrained Natural Language in small space
—  Ambiguity in interpretation
—  Sparsity of low ‘signal-to-noise’: Imbalanced classes
—  1% signals (Seeking/Offering) in 4.9 million tweets #Sandy
—  Hard-to-predict problem:
—  commercial intent, F-1 score 65% on Twitter [Hollerit et al. 2013]
@Zuora wants to help @Network4Good with Hurricane Relief. Text SANDY to
80888 & donate $10 to @redcross @AmeriCares & @SalvationArmyUS #help
*Blue: offering intent, *Red: seeking intent
28
@hemant_pt
Intent Classification: Types & Features
29
Intent
Binary
Crisis Domain:
- [Varga et al. 2013] Problem vs. Aid (Japanese)
- Features: Syntactic, Noun-Verb templates, etc.
Commercial Domain:
- [Hollerit et al. 2013] Buy vs. Sell intent
- Features: N-grams, Part-of-Speech
Multiclass
Commercial Domain:
-  Not on Twitter
@hemant_pt
TOP-DOWN
Pattern Rules:
Declarative Knowledge
(patterns defined for intent association)
BOTTOM-UP
Bag of N-grams Tokens:
Independent Tokens
(patterns derived from the data)
Our
Hybrid
Approach
Learning
Improves
Expressivity
Increases
30
@hemant_pt
Intent Classification Top-Down:
Binary Classifier - Prior Knowledge
—  Conceptual Dependency Theory [Schank, 1972]
—  Make meaning independent from the actual words in input
—  e.g., Class in an Ontology abstracts similar instances
—  Verb Lexicon [Hollerit et al. 2013]
—  Relevant Levin’s Verb categories [Levin, 1993]
—  e.g., give, send, etc.
—  Syntactic Pattern
—  Auxiliary & modals: e.g., ‘be’, ‘do’, ‘could’, etc. [Ramanand et al. 2010]
—  Word order: Verb-Subject positions, etc.
Purohit,	
  Hampton,	
  Bhatt,	
  Shalin,	
  Sheth	
  &	
  Flach.	
  In	
  Journal	
  of	
  CSCW,	
  2014	
  
31
@hemant_pt
Intent Classification Top-Down:
Binary Classifier – Psycholinguistic Rules
—  Transform knowledge into rules
—  Examples:
(Pronouns except 'you' = yes) ^ (need/want = yes) ^ (Adjective = yes/no) ^ (Things=yes) → Seeking
(Pronoun except 'you' | Proper Noun = yes) ^ (can/could/would/should = yes) ^ (Levin Verb = yes)
^ (Determiner = yes/no) ^ (Adjective = yes/no) ^ (Things = yes) -> Offering
Domain
ontology
32
Purohit,	
  Hampton,	
  Bhatt,	
  Shalin,	
  Sheth	
  &	
  Flach.	
  In	
  Journal	
  of	
  CSCW,	
  2014	
  
@hemant_pt
Intent Classification Top-Down:
Binary Classifier - Lessons
—  Preliminary Study
—  2000 conversation and then rule-based classified tweets:
labeled by two native speakers
—  Labels: Seeking, Offering, None
—  Results
—  Avg. F-1 score: 78% (Baseline F-1 score: 57% [Varga et al. 2013] )
—  Lessons
—  Role of prior knowledge: Domain Independent & Dependent
—  Limitation: Exhaustive rule-set, low Recall, Ambiguity
addressed, but sparsity
	
  	
  	
  	
  	
  	
  	
  	
  Purohit,	
  Hampton,	
  Bhatt,	
  Shalin,	
  Sheth	
  &	
  Flach.	
  In	
  Journal	
  of	
  CSCW,	
  2014	
  
33
@hemant_pt
TOP-DOWN
Pattern Rules:
Declarative Knowledge
BOTTOM-UP
Bag of N-grams Tokens:
Independent Tokens
Hybrid
Approach
34
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Design
—  AMBIGUITY: addressed via rich feature space
1. Top-Down: Declarative Knowledge Patterns [Ramanand et al. 2010]
DK(mi, P) ! {0,1}
e.g., P= b(like|want) b.*b(to)b.*b(bring|give|help|raise|donate)b

(acquired via Red Cross expert searches)
2. Abstraction: due to importance in info sharing [Nagarajan et al. 2010]
-  Numeric (e.g., $10) à _NUM_
-  Interactions (e.g., RT & @user) à _RT_ , _MENTION_
-  Links (e.g., http://bit.ly) ! _URL_
3. Bottom-Up: N-grams after stemming and abstraction [Hollerit et al. 2013]
TOKENIZER ( mi ) à { bi-, tri-gram }
35
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Design
—  SPARSITY: addressed via algorithmic choices
1.  Feature Selection
2.  Ensemble Learning
3.  Classifier Chain
36
DATASET
Knowledge-driven
features
XT
, y
m_1
m_2
P(c2)
P(c1)
X1
T, y1
X2
T, y2
1 - P(c1)
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Experiments
—  Binary classifiers:
—  Seeking vs. not Seeking
—  Offering vs. not Offering
—  Dataset:
—  Candidate set: 4000 donation classified tweets
—  Labels: min. 3 judges
—  Annotations: Seeking , Offering , None
37Purohit,	
  Castillo,	
  Diaz,	
  Sheth,	
  &	
  Meier.	
  First	
  Monday	
  journal,	
  2014	
  
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Results
Experiments Supervised
Learning
Training
Samples
Precision
(*Baseline)
F-1
score
Class-
labels
Seeking vs. (None’ +
Offering)
RF
(CR=50:1)
3836 98%
(*79%)
46%
(56%)
56%
requests
Offering vs. (None’) RF
(CR=9:2)
1763 90%
(*65%)
44%
(*58%)
13%
offers
RF = Random Forest ensemble
CR = Asymmetric false–alarm Cost Ratios for True:False
Evaluation : 10-fold CV
Notes:
-  Domain requires high precision than recall
-  Scope for improving low recall
38Purohit,	
  Castillo,	
  Diaz,	
  Sheth,	
  &	
  Meier.	
  First	
  Monday	
  journal,	
  2014	
  
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Generalization
—  Lessons from binary classification
—  Improvement by fusing top-down & bottom-up
—  Sparsity
—  Ambiguity (Seeking & Offering complementary)
—  addressed via improved data representation
Hypothesis: Knowledge-guided approach improves
multiclass classification accuracy
39
@hemant_pt
TOP-DOWN
Knowledge Patterns
(DK) Declarative
(SK) Social Behavior
(CTK, CSK) Contrast Patterns
BOTTOM-UP
Bag of N-grams Tokens:
(T) Independent Tokens
Hybrid
Approach
40
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier – Feature Creation
1. (T) Bag of Tokens -
2. (DK) Declarative Knowledge Patterns
—  Domain expert guidance
—  Psycholinguistics syntactic & semantic rules
—  Expand by WordNet and Levin Verbs
e.g.,
3. (SK) Social Knowledge Indicators
—  Offline conversation indicators studied in Problem 1
e.g., Hj = Dialogue Management, Hj-set = {Thanks, anyway,..}
41
(how = yes) ^ (Modal-Set 'can' = yes) ^ (Pronouns except 'you' = yes) ^ (Levin Verb-Set 'give' = yes)
Feature_Hj (mi) = term-frequency ( Hj-set, mi )
Pj = Feature_Pj (mi) = 1 if Pj exists in mi , else 0
TOKENIZER(mi , min, max)
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Feature Creation
4. (CTK) Contrast Knowledge Patterns
INPUT: corpus {mi} cleaned and abstracted, min. support, X
For each class Cj
—  Find contrasting pattern using sequential pattern mining
OUTPUT: contrast patterns set {P} for each class Cj
5. (CPK) Contrast Patterns: on Part-of-Speech tags of {mi}
42
e.g., unique sequential patterns:
SEEKING: help .* victim .* _url_ .*
OFFERING: anyon .* know .* cloth .*
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Feature Creation
Finding CTK: Contrast Knowledge Patterns
For each class Cj
1.  Tokenize the cleaned, abstracted text of {mi }
2.  Mine Sequential Patterns: SPADE Algorithm
—  - Output: sequences of token sets, {P’}
3.  Reduce to minimal sequences {P}
4.  Compute growth rate & contrast strength for P with all other Ck
5.  Top-K ranked {P} by contrast strength
OUTPUT: contrast patterns set {P} for each class Cj
43
gr(P,Cj,Ck) = support (P,Cj) / support (P,Ck) .. (1)
Contrast-Growth (P,Cj,Ck) = 1/(|Cj| -1) ΣCk, k=/=j gr(P,Cj,Ck)/ (1 + gr(P,Cj,Ck)) ..(2)
Contrast-Strength(P,Cj) = support(P,Cj)*Contrast-Growth(P,Cj,Ck) .. (3)
@hemant_pt
CORPUS
Set of
short text
documents,
S
FEATURES
Knowledge-driven
features
XT
, y
M_1
M_2
M_K
.
.
.
Subset Xj
T ⊂ S such that, Xj
T includes
all the labeled instances of class Cj for
model M_j
Binarization Frameworks for
Multiclass Classifier: 1 vs. All
P(c2)
P(c1)
X1
T, y1
X2
T, y2
XK
T, yK
P(cK)
44(In 1 vs. 1 framework: K*(K-1)/2 classifiers, for each Cj,Ck pair)
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Experiments
—  Datasets
—  Dataset-1: Hurricane Sandy, Oct 27 – Nov 7, 2012
—  Dataset-2: Philippines Typhoon, Nov 7 – Nov 17, 2013
—  Parameters
—  Base Learner M_j: Random Forest, 10 trees with 100 features
—  bi-, tri-gram for (T)
—  K=100% & min. support 10% for CTK, 50% for CPK
45
@hemant_pt
Intent Classification:
Multiclass Classifier – Results
46
56% 58% 60% 62% 64% 66% 68% 70%
T (Baseline)
T,DK
T,SK
T,CTK,CSK
T,DK,SK,CTK,CSK
1-vs-1
1-vs-All
Avg. F-1 Score
(10-fold CV)
Frameworks:
Gain 7%, p < 0.05
Dataset-1 (Hurricane Sandy, 2012)
(Declarative)
(Social)
(Contrast)
@hemant_pt
74% 76% 78% 80% 82% 84% 86%
T (Baseline)
T,DK
T,SK
T,CTK,CSK
T,DK,SK,CTK,CSK
1-vs-1
1-vs-All
Intent Classification:
Multiclass Classifier - Results
47
Frameworks:
Gain 6%, p < 0.05
Dataset-2 (Philippines Typhoon, 2013)
(Declarative)
(Social)
(Contrast)
Avg. F-1 Score
(10-fold CV)
@hemant_pt
Lessons
1.  Top-down & Bottom-up hybrid approach improves data
representation for learning (complementary) intent classes
—  Top 1% discriminative features contained 50% knowledge driven
2.  Offline theoretic social conversation (SK) features (the, thanks,
etc.), often removed for text classification are valuable for
intent.
3.  There is a varying effect of knowledge types (SK vs. DK vs.
CTK/CPK) in different types of real world event datasets
Ø Culturally-sensitive psycholinguistics knowledge in future
48
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: tackle via Engagement Modeling
—  Articulation: tackle via Intent Mining
—  Contributions
—  Problem 1. Conversation Classification using Offline Theories
—  Problem 2. Intent Classification
—  Problem 3. Engagement Modeling
—  Applications
—  Limitations & Future Work
49
@hemant_pt
Thesis: Statement
Prior knowledge, and
interplay of features of users, their content, and network
efficiently model
Intent & Engagement
for cooperation of citizen sensor communities.
50
@hemant_pt
—  Engagement: degree of involvement in discussion
—  Reliable groups: stay focused and collectively behave to diverge on
topics
Problem 3. Group Engagement Model
51Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014
How can organizations find reliable groups to engage for action?
@hemant_pt
—  Engagement: degree of involvement in discussion
—  Reliable groups: stay focused and collectively behave to diverge on topics
—  Why & How do groups collectively evolve over time?
1.  Define a group from interaction network, g
2.  Define Divergence of g: content based in contrast to structure
3.  Predict change in the divergence between time slices
—  Features of g based on theories of social identity, & cohesion
Problem 3. Group Engagement Model
52Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014
@hemant_pt
Group Engagement Model:
Integrated Approach Unlike Prior Work
People (User): Participant
of the discussion
Content (Text): Topic of
Interest
Network (Community):
Group around topic
AND
AND
Sources: tupper-lake.com/.../uploads/Community.jpg
http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html
KEY POINT: capture
User Node Diversity
53
@hemant_pt
—  Candidate Group: Detect in interaction network
—  Group Discussion Divergence: Jenson-Shannon Divergence of topic
distribution on group members’ tweets
Group Engagement Model: Discussion
Divergence
where, H(*) = Shannon Entropy
Bt = Latent topic distribution of each tweet t in all members’ tweets |Tg| ,
Bg = mean topic distribution of group g, such that:
54
@hemant_pt
Lessons
1.  Content Divergence based measure helps explanation of
why groups collectively diverge
—  Less diverging group write more social & future action related
content
2.  Emerging events such as disasters have higher correlation
with social identity-driven features
Ø Role of social context
55
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: tackle via Engagement Modeling
—  Articulation: tackle via Intent Mining
—  Contributions
—  Problem 1. Conversation Classification using Offline Theories
—  Problem 2. Intent Classification
—  Problem 3. Engagement Modeling
—  Applications
—  Limitations & Future Work
56
@hemant_pt
DISASTER Event
Application-1: Filter Content for
Disaster Response
CITIZEN
Sensors
RESPONSE
Organizations
Me	
  and	
  @CeceVancePR	
  are	
  coordinating	
  a	
  clothing/
food	
  drive	
  for	
  families	
  affected	
  by	
  Hurricane	
  Sandy.	
  
If	
  you	
  would	
  like	
  to	
  donate,	
  DM	
  us	
  	
  	
  
Does	
  anyone	
  know	
  how	
  to	
  donate	
  clothes	
  to	
  
hurricane	
  #Sandy	
  victims?	
  
[SEEKING	
  
[OFFERING	
  
Intent-Classifiers
as a Service
57
@hemant_pt
Broader Impact: Classifier Model
integrated by Crisis Mapping Pioneer
58
@hemant_pt
DISASTER Event
Application-2: “We TRUST people!”
User engagement tool
CITIZEN
Sensors
RESPONSE
Organizations
Tool to mine
Important
users
59
@hemant_pt
Broader Impact: Winner of Int’l Challenge: UN
ITU Young Innovators 2014
60
@hemant_pt
Articulation
ENGAGEMENT MODELING INTENT MINING
COOPERATIVE
SYSTEM
61
ORGANIZATIONS	
   CITIZEN	
  SENSOR	
  COMMUNITIES	
  
Awareness
Q1. Who to
engage
first?
Org. Actor
Q2. What are
Resource needs &
availabilities?
Org. Actor
@hemant_pt
Limitations & Future Work
—  Cooperative System
—  CSCW Application specific to domain of crisis
Ø  How to create a full What-Where-When-Who knowledge base
—  Intent Mining
—  Non-cooperation assistive intent classes not considered, as well as
the temporal drift of intent not considered
Ø  How to mine actor-level intent beyond document level
—  Group Engagement
—  Reliable prioritized groups based on Correlation, not Causality
—  Interplay of Offline and Online interactions beyond the scope
Ø  How to incorporate intent in the group divergence
—  Bipartite Intent Graph Matching
—  Reducing time complexity of Seeking vs. Offering matching
62
@hemant_pt
Conclusion
Prior knowledge, and
interplay of features of users, their content, and network
efficiently model
Intent & Engagement
for cooperation between citizen sensors and organizations in
the online social communities.
63
@hemant_pt
Thanks to the Committee Members
64
[Left to Right] Prof. Amit Sheth, (advisor, WSU), Prof. Guozhu Dong (WSU), Prof. Srinivasan
Parthasarathy (OSU), Prof. TK Prasad (WSU), Dr. Patrick Meier (QCRI), Prof. Valerie Shalin (WSU)
Computer Science Social Science
@hemant_pt
Acknowledgement,
Thanks and Questions J
—  NSF SoCS grant IIS-1111182 to support this work
—  Interdisciplinary Mentors especially Prof. John Flach (WSU), Drs. Carlos
Castillo (QCRI), Fernando Diaz (Microsoft), Meena Nagarajan (IBM)
—  Kno.e.sis team especially Andrew Hampton from Psychology dept. and
Shreyansh and Tanvi from CSE at Wright State, as well as Yiye Ruan (now
Google) & David Fuhry at the Data Mining Lab, Ohio State University
—  Colleagues: Digital Volunteers from the CrisisMappers network, StandBy Task
Force, InCrisisRelief.org, info4Disasters, Humanity Road, Ushahidi, etc. and
the subject matter experts at UN FPA
65
@hemant_pt
Ambiguity
Sparsity
Diversity
Scalability
•  Mutual Influence in Sparse
Friendship Network
[AAAI ICWSM’12]
•  User Summarization with
Sparse Profile Metadata
[ASE SocialInfo’12]
•  Matching intent as task of
Information Retrieval [FM’14]
•  Knowledge-aware Bi-partite
Matching [In preparation]
•  Short-Text Document Intent
Mining [FM’14, JCSCW’14]
•  Actor-Intent Mining
Complexity [In preparation]
•  Modeling Group Using
Diverse Social Identity &
Cohesion [AAAI ICWSM’14]
•  Modeling Diverse User-
Engagement [SOME WWW’11,
ACM WebSci’12]
(Interpretation)
(users)
(behaviors)
66
Other
works

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

  • 1.
    Mining Citizen SensorCommunities to Improve Cooperation with Organizational Actors June 23 2015 PhD Defense Hemant Purohit (Advisor: Prof. Amit Sheth)   Kno.e.sis, Dept. of CSE, Wright State University, USA
  • 2.
    @hemant_pt Outline —  Citizen SensorCommunities & Organizations —  Cooperative System Design Challenges —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 2
  • 3.
    @hemant_pt Citizen Sensors: Accessto Human Observations & Interactions Uni-directional communication (TO people) Unstructured, Unconstrained Language Data •  Ambiguity •  Sparsity •  Diversity •  Scalability Bi-directional (BY people, TO people) Web 2.0 media 3
  • 4.
    @hemant_pt Goal: Data toDecision Making Organizational Decision Making Noisy Citizen Sensor data 4 SOCIAL SCIENCE •  Experts on Organizations •  Small-scale Data COMPUTER SCIENCE •  Experts on Mining •  Large-scale data Scope of My Research
  • 5.
    @hemant_pt 1.  No StructuredRoles 2.  No Defined Tasks ü  But “GENERATE” Massive Data 1.  Structured Roles 2.  Defined Tasks ü  COLLECT Data ü  Process, & Make Decisions ORGANIZATIONS   Sure! How to help? CITIZEN  SENSOR  COMMUNITIES   5 COOPERATIVE SYSTEM Can you help us?
  • 6.
    @hemant_pt Computer-Supported Cooperative Work (CSCW)Matrix 6 [Johansen 1988, Baecker 1995] TIME PLACE
  • 7.
    @hemant_pt Articulation Challenges (Malone & Crowston1990; Schmidt & Bannon 1992) ENGAGEMENT MODELING INTENT MINING COOPERATIVE SYSTEM DATA PROBLEM DESIGN PROBLEM 7 ORGANIZATIONS   CITIZEN  SENSOR  COMMUNITIES   Awareness Q1. Who to engage first? Org. Actor Q2. What are resource needs & availabilities? Org. Actor
  • 8.
    @hemant_pt Research Questions —  Cangeneral theories of offline conversation be applied in the online context? —  Can we model intentions to inform organizational tasks using knowledge-guided features? —  Can we find reliable groups to engage by modeling collective group divergence using content-based measure? 8
  • 9.
    @hemant_pt Thesis: Statement Prior knowledge,and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation of citizen sensor communities. Scope of Concepts •  Intent: aim of action, e.g., offering help •  Engagement: involvement in activity, e.g., participating in discussion 9
  • 10.
    @hemant_pt Contributions 1.  Operationalized computingin cooperative system design —  by accommodating articulation in Intent Mining, and —  enriching awareness by Engagement Modeling 2.  Improved computation of online social data —  by incorporating features from offline social theoretical knowledge 3.  Improved performance of intent classification —  by fusing top-down & bottom-up data representations 4.  Improved explanation of group engagement —  by modeling content divergence to complement existing structural measures 10
  • 11.
    @hemant_pt Data: Scope —  SocialPlatform: Twitter —  Important bridge between citizens & organizations —  Characteristics —  Users: follow/subscribe —  Content: status updates (140 chars max) —  Network: directed —  Platform conversation functions —  Reply —  Retweet —  Mention 11
  • 12.
    @hemant_pt Outline —  Citizen SensorCommunities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 12
  • 13.
    @hemant_pt User1. Analyzing #Conversationson Twitter. Using platform provided functions #REPLY, #RT, and #Mention. .. … …….. User2. I kinda feel one might need more than just the platform fn -- @User1 u can think #Psycholinguistics, dude! Problem 1. Conversation Classification —  Function of Reply, Retweet, Mention reflect conversation 13 R1. Can general theories of conversation be applied in the online context?
  • 14.
    @hemant_pt Problem 1. ConversationClassification —  Function of Reply, Retweet, Mention reflect conversation —  Task: Given a set S of messages mi, Classify a sample {mi} for {RP, None}, {RT, None}, {MN, None} , where —  Ground-truth corpuses —  RP = { mi | has_Reply_function (mi) = True } —  RT = { mi | has_Retweet_function (mi) = True } —  MN = { mi | has_Mention_function (mi) = True } —  None = S – {RP, RT, MN} —  Sample {mi} size = 3, based on average Reply conversation size 14
  • 15.
    @hemant_pt Conversation Classification: Offline Theories — Psycholinguistics Indicators [Clark & Gibbs, 1986, Chafe 1987, etc.] —  Determiners (‘the’ vs. ‘a/an’) —  Dialogue Management (e.g., ‘thanks’, ’anyway’), etc. —  Drawback —  Offline analysis focused on positive conversation instances —  Hypotheses —  Offline theoretic features are discriminative —  Such features correlate with information density 15
  • 16.
    @hemant_pt Conversation Classification: Feature Examples 16 CATEGORYHj Hj SET H1 - Determiners (the) H3 - Subject pronouns (she, he, we, they) H9 - Dialogue management indicators (thanks, yes, ok, sorry, hi, hello, bye, anyway, how about, so, what do you mean, please, {could, would, should, can, will} followed by pronoun) H11 - Hedge words (kinda, sorta) •  Feature_Hj (mi) = term-frequency ( Hj-set, mi ) •  Normalized •  Total 14 feature categories
  • 17.
    @hemant_pt Conversation Classification: Results — Dataset —  Tweets from 3 Disasters, and 3 Non-Disaster events —  Varying set size (3.8K – 609K), time periods —  Classifier: —  Decision Tree —  Evaluation: 10-fold Cross Validation —  Accuracy: 62% - 78% [Lowest for {Mention,None} ] —  AUC range: 0.63 - 0.84 17  Purohit,  Hampton,  Shalin,  Sheth  &  Flach.  In  Journal  of  Computers  in  Human  Behavior,  2013
  • 18.
    @hemant_pt Conversation Classification: Discriminative Features — Consistent top features across classifiers —  Pronouns (e.g., you, he) —  Dialogue management (e.g., thanks) —  Determiners (e.g., the) —  Word counts —  Positively correlated with RP, RT, MN —  Correlation Coefficient up to 0.69 18
  • 19.
    @hemant_pt Conversation Classification: Psycholinguistic Analysis — LIWC: Tool for deeper content analysis [Pennebaker, 2001] —  Gives a measure per psychological category —  Categories of interest —  Social Interaction —  Sensed Experience —  Communication —  Analyzed output sets in confusion matrices Ø  Higher values for positive classified conversation Ø suggests higher information for cooperative intent 19  Purohit,  Hampton,  Shalin,  Sheth  &  Flach.  In  Journal  of  Computers  in  Human  Behavior,  2013 True Positive False Negative False Positive True Negative
  • 20.
    @hemant_pt Conversation Classification: Lessons 1.  Offlinetheoretic features of conversations exist in the online environment Ø  Can be applied for computing social data 2.  Such features correlate with information density in content - Reflection of conversation for an intent 20
  • 21.
    @hemant_pt Outline —  Citizen SensorCommunities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 21
  • 22.
    @hemant_pt Thesis: Statement Prior knowledge,and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation of citizen sensor communities. 22
  • 23.
    @hemant_pt Short-text Document Intent — Intent: Aim of action DOCUMENT   INTENT Text  REDCROSS  to  90999  to  donate  10$  to  help  the  victims  of   hurricane  sandy SEEKING HELP Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy OFFERING HELP   Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http:// t.co/LyCSprbk has valuable info! ADVISING   23
  • 24.
    @hemant_pt Short-text Document Intent — Intent: Aim of action DOCUMENT   INTENT Text  REDCROSS  to  90999  to  donate  10$  to  help  the  victims  of   hurricane  sandy SEEKING HELP Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy OFFERING HELP   Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http:// t.co/LyCSprbk has valuable info! ADVISING   24 How to identify relevant intent from ambiguous, unconstrained natural language text? Relevant intent è Articulation of organizational tasks (e.g., Seeking vs. Offering resources)
  • 25.
    @hemant_pt Intent Classification: Problem Formulation — Given a set of user-generated text documents, identify existing intents —  Variety of interpretations —  Problem statement: a multi-class classification task approximate f: S ! C , where C = {c1, c2 … cK} is a set of predefined K intent classes, and S = {m1, m2 … mN} is a set of N short text documents Focus - Cooperation-assistive intent classes, C= {Seeking, Offering, None} 25
  • 26.
    @hemant_pt Intent Classification: RelatedWork TEXT CLASSIFICATION TYPE FOCUS EXAMPLE Topic predominant subject matter sports or entertainment Sentiment/Emotion/ Opinion focus on present state of emotional affairs negative or positive; happy emotion Intent Focus on action, hence, future state of affairs offer to help after floods e.g., I am going to watch the awesome Fast and Furious movie!! #Excited 26
  • 27.
    @hemant_pt Intent Classification: RelatedWork DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY 27 Formal text on Webpages/blogs (Kröll and Strohmaier 2009, -15; Raslan et al. 2013, -14) Knowledge Acquisition: via Rules, Clustering •  Lack of large corpora with proper grammatical structure •  Poor quality text hard to parse for dependencies Commercial Reviews, marketplace (Hollerit et al. 2013, Wu et al. 2011, Ramanand et al. 2010, Carlos & Yalamanchi 2012, Nagarajan et al. 2009) Classification: via Rules, Lexical template based, Pattern •  More generalized intents (e.g., ‘help’ broader than ‘sell’) •  Patterns implicit to capture than for buying/selling Search Queries (Broder 2002, Downey et al. 2008,, Case 2012, Wu et al. 2010, Strohmaier & Kröll 2012) User Profiling: Query Classification •  Lack of large query logs, click graphs •  Existence of social conversation
  • 28.
    @hemant_pt Intent Classification: Challenges — Unconstrained Natural Language in small space —  Ambiguity in interpretation —  Sparsity of low ‘signal-to-noise’: Imbalanced classes —  1% signals (Seeking/Offering) in 4.9 million tweets #Sandy —  Hard-to-predict problem: —  commercial intent, F-1 score 65% on Twitter [Hollerit et al. 2013] @Zuora wants to help @Network4Good with Hurricane Relief. Text SANDY to 80888 & donate $10 to @redcross @AmeriCares & @SalvationArmyUS #help *Blue: offering intent, *Red: seeking intent 28
  • 29.
    @hemant_pt Intent Classification: Types& Features 29 Intent Binary Crisis Domain: - [Varga et al. 2013] Problem vs. Aid (Japanese) - Features: Syntactic, Noun-Verb templates, etc. Commercial Domain: - [Hollerit et al. 2013] Buy vs. Sell intent - Features: N-grams, Part-of-Speech Multiclass Commercial Domain: -  Not on Twitter
  • 30.
    @hemant_pt TOP-DOWN Pattern Rules: Declarative Knowledge (patternsdefined for intent association) BOTTOM-UP Bag of N-grams Tokens: Independent Tokens (patterns derived from the data) Our Hybrid Approach Learning Improves Expressivity Increases 30
  • 31.
    @hemant_pt Intent Classification Top-Down: BinaryClassifier - Prior Knowledge —  Conceptual Dependency Theory [Schank, 1972] —  Make meaning independent from the actual words in input —  e.g., Class in an Ontology abstracts similar instances —  Verb Lexicon [Hollerit et al. 2013] —  Relevant Levin’s Verb categories [Levin, 1993] —  e.g., give, send, etc. —  Syntactic Pattern —  Auxiliary & modals: e.g., ‘be’, ‘do’, ‘could’, etc. [Ramanand et al. 2010] —  Word order: Verb-Subject positions, etc. Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014   31
  • 32.
    @hemant_pt Intent Classification Top-Down: BinaryClassifier – Psycholinguistic Rules —  Transform knowledge into rules —  Examples: (Pronouns except 'you' = yes) ^ (need/want = yes) ^ (Adjective = yes/no) ^ (Things=yes) → Seeking (Pronoun except 'you' | Proper Noun = yes) ^ (can/could/would/should = yes) ^ (Levin Verb = yes) ^ (Determiner = yes/no) ^ (Adjective = yes/no) ^ (Things = yes) -> Offering Domain ontology 32 Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014  
  • 33.
    @hemant_pt Intent Classification Top-Down: BinaryClassifier - Lessons —  Preliminary Study —  2000 conversation and then rule-based classified tweets: labeled by two native speakers —  Labels: Seeking, Offering, None —  Results —  Avg. F-1 score: 78% (Baseline F-1 score: 57% [Varga et al. 2013] ) —  Lessons —  Role of prior knowledge: Domain Independent & Dependent —  Limitation: Exhaustive rule-set, low Recall, Ambiguity addressed, but sparsity                Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014   33
  • 34.
    @hemant_pt TOP-DOWN Pattern Rules: Declarative Knowledge BOTTOM-UP Bagof N-grams Tokens: Independent Tokens Hybrid Approach 34
  • 35.
    @hemant_pt Intent Classification Hybrid: BinaryClassifier - Design —  AMBIGUITY: addressed via rich feature space 1. Top-Down: Declarative Knowledge Patterns [Ramanand et al. 2010] DK(mi, P) ! {0,1} e.g., P= b(like|want) b.*b(to)b.*b(bring|give|help|raise|donate)b (acquired via Red Cross expert searches) 2. Abstraction: due to importance in info sharing [Nagarajan et al. 2010] -  Numeric (e.g., $10) à _NUM_ -  Interactions (e.g., RT & @user) à _RT_ , _MENTION_ -  Links (e.g., http://bit.ly) ! _URL_ 3. Bottom-Up: N-grams after stemming and abstraction [Hollerit et al. 2013] TOKENIZER ( mi ) à { bi-, tri-gram } 35
  • 36.
    @hemant_pt Intent Classification Hybrid: BinaryClassifier - Design —  SPARSITY: addressed via algorithmic choices 1.  Feature Selection 2.  Ensemble Learning 3.  Classifier Chain 36 DATASET Knowledge-driven features XT , y m_1 m_2 P(c2) P(c1) X1 T, y1 X2 T, y2 1 - P(c1)
  • 37.
    @hemant_pt Intent Classification Hybrid: BinaryClassifier - Experiments —  Binary classifiers: —  Seeking vs. not Seeking —  Offering vs. not Offering —  Dataset: —  Candidate set: 4000 donation classified tweets —  Labels: min. 3 judges —  Annotations: Seeking , Offering , None 37Purohit,  Castillo,  Diaz,  Sheth,  &  Meier.  First  Monday  journal,  2014  
  • 38.
    @hemant_pt Intent Classification Hybrid: BinaryClassifier - Results Experiments Supervised Learning Training Samples Precision (*Baseline) F-1 score Class- labels Seeking vs. (None’ + Offering) RF (CR=50:1) 3836 98% (*79%) 46% (56%) 56% requests Offering vs. (None’) RF (CR=9:2) 1763 90% (*65%) 44% (*58%) 13% offers RF = Random Forest ensemble CR = Asymmetric false–alarm Cost Ratios for True:False Evaluation : 10-fold CV Notes: -  Domain requires high precision than recall -  Scope for improving low recall 38Purohit,  Castillo,  Diaz,  Sheth,  &  Meier.  First  Monday  journal,  2014  
  • 39.
    @hemant_pt Intent Classification Hybrid: MulticlassClassifier - Generalization —  Lessons from binary classification —  Improvement by fusing top-down & bottom-up —  Sparsity —  Ambiguity (Seeking & Offering complementary) —  addressed via improved data representation Hypothesis: Knowledge-guided approach improves multiclass classification accuracy 39
  • 40.
    @hemant_pt TOP-DOWN Knowledge Patterns (DK) Declarative (SK)Social Behavior (CTK, CSK) Contrast Patterns BOTTOM-UP Bag of N-grams Tokens: (T) Independent Tokens Hybrid Approach 40
  • 41.
    @hemant_pt Intent Classification Hybrid: MulticlassClassifier – Feature Creation 1. (T) Bag of Tokens - 2. (DK) Declarative Knowledge Patterns —  Domain expert guidance —  Psycholinguistics syntactic & semantic rules —  Expand by WordNet and Levin Verbs e.g., 3. (SK) Social Knowledge Indicators —  Offline conversation indicators studied in Problem 1 e.g., Hj = Dialogue Management, Hj-set = {Thanks, anyway,..} 41 (how = yes) ^ (Modal-Set 'can' = yes) ^ (Pronouns except 'you' = yes) ^ (Levin Verb-Set 'give' = yes) Feature_Hj (mi) = term-frequency ( Hj-set, mi ) Pj = Feature_Pj (mi) = 1 if Pj exists in mi , else 0 TOKENIZER(mi , min, max)
  • 42.
    @hemant_pt Intent Classification Hybrid: MulticlassClassifier - Feature Creation 4. (CTK) Contrast Knowledge Patterns INPUT: corpus {mi} cleaned and abstracted, min. support, X For each class Cj —  Find contrasting pattern using sequential pattern mining OUTPUT: contrast patterns set {P} for each class Cj 5. (CPK) Contrast Patterns: on Part-of-Speech tags of {mi} 42 e.g., unique sequential patterns: SEEKING: help .* victim .* _url_ .* OFFERING: anyon .* know .* cloth .*
  • 43.
    @hemant_pt Intent Classification Hybrid: MulticlassClassifier - Feature Creation Finding CTK: Contrast Knowledge Patterns For each class Cj 1.  Tokenize the cleaned, abstracted text of {mi } 2.  Mine Sequential Patterns: SPADE Algorithm —  - Output: sequences of token sets, {P’} 3.  Reduce to minimal sequences {P} 4.  Compute growth rate & contrast strength for P with all other Ck 5.  Top-K ranked {P} by contrast strength OUTPUT: contrast patterns set {P} for each class Cj 43 gr(P,Cj,Ck) = support (P,Cj) / support (P,Ck) .. (1) Contrast-Growth (P,Cj,Ck) = 1/(|Cj| -1) ΣCk, k=/=j gr(P,Cj,Ck)/ (1 + gr(P,Cj,Ck)) ..(2) Contrast-Strength(P,Cj) = support(P,Cj)*Contrast-Growth(P,Cj,Ck) .. (3)
  • 44.
    @hemant_pt CORPUS Set of short text documents, S FEATURES Knowledge-driven features XT ,y M_1 M_2 M_K . . . Subset Xj T ⊂ S such that, Xj T includes all the labeled instances of class Cj for model M_j Binarization Frameworks for Multiclass Classifier: 1 vs. All P(c2) P(c1) X1 T, y1 X2 T, y2 XK T, yK P(cK) 44(In 1 vs. 1 framework: K*(K-1)/2 classifiers, for each Cj,Ck pair)
  • 45.
    @hemant_pt Intent Classification Hybrid: MulticlassClassifier - Experiments —  Datasets —  Dataset-1: Hurricane Sandy, Oct 27 – Nov 7, 2012 —  Dataset-2: Philippines Typhoon, Nov 7 – Nov 17, 2013 —  Parameters —  Base Learner M_j: Random Forest, 10 trees with 100 features —  bi-, tri-gram for (T) —  K=100% & min. support 10% for CTK, 50% for CPK 45
  • 46.
    @hemant_pt Intent Classification: Multiclass Classifier– Results 46 56% 58% 60% 62% 64% 66% 68% 70% T (Baseline) T,DK T,SK T,CTK,CSK T,DK,SK,CTK,CSK 1-vs-1 1-vs-All Avg. F-1 Score (10-fold CV) Frameworks: Gain 7%, p < 0.05 Dataset-1 (Hurricane Sandy, 2012) (Declarative) (Social) (Contrast)
  • 47.
    @hemant_pt 74% 76% 78%80% 82% 84% 86% T (Baseline) T,DK T,SK T,CTK,CSK T,DK,SK,CTK,CSK 1-vs-1 1-vs-All Intent Classification: Multiclass Classifier - Results 47 Frameworks: Gain 6%, p < 0.05 Dataset-2 (Philippines Typhoon, 2013) (Declarative) (Social) (Contrast) Avg. F-1 Score (10-fold CV)
  • 48.
    @hemant_pt Lessons 1.  Top-down &Bottom-up hybrid approach improves data representation for learning (complementary) intent classes —  Top 1% discriminative features contained 50% knowledge driven 2.  Offline theoretic social conversation (SK) features (the, thanks, etc.), often removed for text classification are valuable for intent. 3.  There is a varying effect of knowledge types (SK vs. DK vs. CTK/CPK) in different types of real world event datasets Ø Culturally-sensitive psycholinguistics knowledge in future 48
  • 49.
    @hemant_pt Outline —  Citizen SensorCommunities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 49
  • 50.
    @hemant_pt Thesis: Statement Prior knowledge,and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation of citizen sensor communities. 50
  • 51.
    @hemant_pt —  Engagement: degreeof involvement in discussion —  Reliable groups: stay focused and collectively behave to diverge on topics Problem 3. Group Engagement Model 51Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014 How can organizations find reliable groups to engage for action?
  • 52.
    @hemant_pt —  Engagement: degreeof involvement in discussion —  Reliable groups: stay focused and collectively behave to diverge on topics —  Why & How do groups collectively evolve over time? 1.  Define a group from interaction network, g 2.  Define Divergence of g: content based in contrast to structure 3.  Predict change in the divergence between time slices —  Features of g based on theories of social identity, & cohesion Problem 3. Group Engagement Model 52Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014
  • 53.
    @hemant_pt Group Engagement Model: IntegratedApproach Unlike Prior Work People (User): Participant of the discussion Content (Text): Topic of Interest Network (Community): Group around topic AND AND Sources: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html KEY POINT: capture User Node Diversity 53
  • 54.
    @hemant_pt —  Candidate Group:Detect in interaction network —  Group Discussion Divergence: Jenson-Shannon Divergence of topic distribution on group members’ tweets Group Engagement Model: Discussion Divergence where, H(*) = Shannon Entropy Bt = Latent topic distribution of each tweet t in all members’ tweets |Tg| , Bg = mean topic distribution of group g, such that: 54
  • 55.
    @hemant_pt Lessons 1.  Content Divergencebased measure helps explanation of why groups collectively diverge —  Less diverging group write more social & future action related content 2.  Emerging events such as disasters have higher correlation with social identity-driven features Ø Role of social context 55
  • 56.
    @hemant_pt Outline —  Citizen SensorCommunities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 56
  • 57.
    @hemant_pt DISASTER Event Application-1: FilterContent for Disaster Response CITIZEN Sensors RESPONSE Organizations Me  and  @CeceVancePR  are  coordinating  a  clothing/ food  drive  for  families  affected  by  Hurricane  Sandy.   If  you  would  like  to  donate,  DM  us       Does  anyone  know  how  to  donate  clothes  to   hurricane  #Sandy  victims?   [SEEKING   [OFFERING   Intent-Classifiers as a Service 57
  • 58.
    @hemant_pt Broader Impact: ClassifierModel integrated by Crisis Mapping Pioneer 58
  • 59.
    @hemant_pt DISASTER Event Application-2: “WeTRUST people!” User engagement tool CITIZEN Sensors RESPONSE Organizations Tool to mine Important users 59
  • 60.
    @hemant_pt Broader Impact: Winnerof Int’l Challenge: UN ITU Young Innovators 2014 60
  • 61.
    @hemant_pt Articulation ENGAGEMENT MODELING INTENTMINING COOPERATIVE SYSTEM 61 ORGANIZATIONS   CITIZEN  SENSOR  COMMUNITIES   Awareness Q1. Who to engage first? Org. Actor Q2. What are Resource needs & availabilities? Org. Actor
  • 62.
    @hemant_pt Limitations & FutureWork —  Cooperative System —  CSCW Application specific to domain of crisis Ø  How to create a full What-Where-When-Who knowledge base —  Intent Mining —  Non-cooperation assistive intent classes not considered, as well as the temporal drift of intent not considered Ø  How to mine actor-level intent beyond document level —  Group Engagement —  Reliable prioritized groups based on Correlation, not Causality —  Interplay of Offline and Online interactions beyond the scope Ø  How to incorporate intent in the group divergence —  Bipartite Intent Graph Matching —  Reducing time complexity of Seeking vs. Offering matching 62
  • 63.
    @hemant_pt Conclusion Prior knowledge, and interplayof features of users, their content, and network efficiently model Intent & Engagement for cooperation between citizen sensors and organizations in the online social communities. 63
  • 64.
    @hemant_pt Thanks to theCommittee Members 64 [Left to Right] Prof. Amit Sheth, (advisor, WSU), Prof. Guozhu Dong (WSU), Prof. Srinivasan Parthasarathy (OSU), Prof. TK Prasad (WSU), Dr. Patrick Meier (QCRI), Prof. Valerie Shalin (WSU) Computer Science Social Science
  • 65.
    @hemant_pt Acknowledgement, Thanks and QuestionsJ —  NSF SoCS grant IIS-1111182 to support this work —  Interdisciplinary Mentors especially Prof. John Flach (WSU), Drs. Carlos Castillo (QCRI), Fernando Diaz (Microsoft), Meena Nagarajan (IBM) —  Kno.e.sis team especially Andrew Hampton from Psychology dept. and Shreyansh and Tanvi from CSE at Wright State, as well as Yiye Ruan (now Google) & David Fuhry at the Data Mining Lab, Ohio State University —  Colleagues: Digital Volunteers from the CrisisMappers network, StandBy Task Force, InCrisisRelief.org, info4Disasters, Humanity Road, Ushahidi, etc. and the subject matter experts at UN FPA 65
  • 66.
    @hemant_pt Ambiguity Sparsity Diversity Scalability •  Mutual Influencein Sparse Friendship Network [AAAI ICWSM’12] •  User Summarization with Sparse Profile Metadata [ASE SocialInfo’12] •  Matching intent as task of Information Retrieval [FM’14] •  Knowledge-aware Bi-partite Matching [In preparation] •  Short-Text Document Intent Mining [FM’14, JCSCW’14] •  Actor-Intent Mining Complexity [In preparation] •  Modeling Group Using Diverse Social Identity & Cohesion [AAAI ICWSM’14] •  Modeling Diverse User- Engagement [SOME WWW’11, ACM WebSci’12] (Interpretation) (users) (behaviors) 66 Other works