Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mining Citizen Sensor Communities to Improve
Cooperation with Organizational Actors
June 23 2015
PhD Defense
Hemant Purohi...
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Contributio...
@hemant_pt
Citizen Sensors: Access to Human
Observations & Interactions
Uni-directional communication
(TO people)
Unstruct...
@hemant_pt
Goal: Data to Decision Making
Organizational Decision Making
Noisy Citizen Sensor data
4
SOCIAL SCIENCE
•  Expe...
@hemant_pt
1.  No Structured Roles
2.  No Defined Tasks
ü  But “GENERATE”
Massive Data
1.  Structured Roles
2.  Defined T...
@hemant_pt
Computer-Supported Cooperative
Work (CSCW) Matrix
6
[Johansen
1988,
Baecker
1995]
TIME
PLACE
@hemant_pt
Articulation
Challenges
(Malone & Crowston 1990;
Schmidt & Bannon 1992)
ENGAGEMENT MODELING INTENT MINING
COOPE...
@hemant_pt
Research Questions
—  Can general theories of offline conversation be
applied in the online context?
—  Can w...
@hemant_pt
Thesis: Statement
Prior knowledge, and
interplay of features of users, their content, and network
efficiently m...
@hemant_pt
Contributions
1.  Operationalized computing in cooperative system design
—  by accommodating articulation in I...
@hemant_pt
Data: Scope
—  Social Platform: Twitter
—  Important bridge between citizens & organizations
—  Characterist...
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: ...
@hemant_pt
User1. Analyzing #Conversations on Twitter. Using platform provided
functions #REPLY, #RT, and #Mention.
..
…
…...
@hemant_pt
Problem 1. Conversation Classification
—  Function of Reply, Retweet, Mention reflect conversation
—  Task: G...
@hemant_pt
Conversation Classification: Offline
Theories
—  Psycholinguistics Indicators [Clark & Gibbs, 1986, Chafe 1987...
@hemant_pt
Conversation Classification: Feature
Examples
16
CATEGORY Hj Hj SET
H1 - Determiners (the)
H3 - Subject pronoun...
@hemant_pt
Conversation Classification: Results
—  Dataset
—  Tweets from 3 Disasters, and 3 Non-Disaster events
—  Var...
@hemant_pt
Conversation Classification:
Discriminative Features
—  Consistent top features across classifiers
—  Pronoun...
@hemant_pt
Conversation Classification:
Psycholinguistic Analysis
—  LIWC: Tool for deeper content analysis [Pennebaker, ...
@hemant_pt
Conversation Classification:
Lessons
1.  Offline theoretic features of conversations exist in the
online enviro...
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: ...
@hemant_pt
Thesis: Statement
Prior knowledge, and
interplay of features of users, their content, and network
efficiently m...
@hemant_pt
Short-text Document Intent
—  Intent: Aim of action
DOCUMENT	
   INTENT
Text	
  REDCROSS	
  to	
  90999	
  to	...
@hemant_pt
Short-text Document Intent
—  Intent: Aim of action
DOCUMENT	
   INTENT
Text	
  REDCROSS	
  to	
  90999	
  to	...
@hemant_pt
Intent Classification: Problem
Formulation
—  Given a set of user-generated text documents, identify
existing ...
@hemant_pt
Intent Classification: Related Work
TEXT CLASSIFICATION
TYPE
FOCUS EXAMPLE
Topic predominant
subject matter
spo...
@hemant_pt
Intent Classification: Related Work
DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY
27
Formal text on
Webpages/b...
@hemant_pt
Intent Classification: Challenges
—  Unconstrained Natural Language in small space
—  Ambiguity in interpreta...
@hemant_pt
Intent Classification: Types & Features
29
Intent
Binary
Crisis Domain:
- [Varga et al. 2013] Problem vs. Aid (...
@hemant_pt
TOP-DOWN
Pattern Rules:
Declarative Knowledge
(patterns defined for intent association)
BOTTOM-UP
Bag of N-gram...
@hemant_pt
Intent Classification Top-Down:
Binary Classifier - Prior Knowledge
—  Conceptual Dependency Theory [Schank, 1...
@hemant_pt
Intent Classification Top-Down:
Binary Classifier – Psycholinguistic Rules
—  Transform knowledge into rules
—...
@hemant_pt
Intent Classification Top-Down:
Binary Classifier - Lessons
—  Preliminary Study
—  2000 conversation and the...
@hemant_pt
TOP-DOWN
Pattern Rules:
Declarative Knowledge
BOTTOM-UP
Bag of N-grams Tokens:
Independent Tokens
Hybrid
Approa...
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Design
—  AMBIGUITY: addressed via rich feature space
1. Top...
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Design
—  SPARSITY: addressed via algorithmic choices
1.  Fe...
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Experiments
—  Binary classifiers:
—  Seeking vs. not Seeki...
@hemant_pt
Intent Classification Hybrid:
Binary Classifier - Results
Experiments Supervised
Learning
Training
Samples
Prec...
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Generalization
—  Lessons from binary classification
— ...
@hemant_pt
TOP-DOWN
Knowledge Patterns
(DK) Declarative
(SK) Social Behavior
(CTK, CSK) Contrast Patterns
BOTTOM-UP
Bag of...
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier – Feature Creation
1. (T) Bag of Tokens -
2. (DK) Declarati...
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Feature Creation
4. (CTK) Contrast Knowledge Patterns
INP...
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Feature Creation
Finding CTK: Contrast Knowledge Patterns...
@hemant_pt
CORPUS
Set of
short text
documents,
S
FEATURES
Knowledge-driven
features
XT
, y
M_1
M_2
M_K
.
.
.
Subset Xj
T ⊂...
@hemant_pt
Intent Classification Hybrid:
Multiclass Classifier - Experiments
—  Datasets
—  Dataset-1: Hurricane Sandy, ...
@hemant_pt
Intent Classification:
Multiclass Classifier – Results
46
56% 58% 60% 62% 64% 66% 68% 70%
T (Baseline)
T,DK
T,S...
@hemant_pt
74% 76% 78% 80% 82% 84% 86%
T (Baseline)
T,DK
T,SK
T,CTK,CSK
T,DK,SK,CTK,CSK
1-vs-1
1-vs-All
Intent Classificat...
@hemant_pt
Lessons
1.  Top-down & Bottom-up hybrid approach improves data
representation for learning (complementary) inte...
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: ...
@hemant_pt
Thesis: Statement
Prior knowledge, and
interplay of features of users, their content, and network
efficiently m...
@hemant_pt
—  Engagement: degree of involvement in discussion
—  Reliable groups: stay focused and collectively behave t...
@hemant_pt
—  Engagement: degree of involvement in discussion
—  Reliable groups: stay focused and collectively behave t...
@hemant_pt
Group Engagement Model:
Integrated Approach Unlike Prior Work
People (User): Participant
of the discussion
Cont...
@hemant_pt
—  Candidate Group: Detect in interaction network
—  Group Discussion Divergence: Jenson-Shannon Divergence o...
@hemant_pt
Lessons
1.  Content Divergence based measure helps explanation of
why groups collectively diverge
—  Less dive...
@hemant_pt
Outline
—  Citizen Sensor Communities & Organizations
—  Cooperative System Design Challenges
—  Awareness: ...
@hemant_pt
DISASTER Event
Application-1: Filter Content for
Disaster Response
CITIZEN
Sensors
RESPONSE
Organizations
Me	
 ...
@hemant_pt
Broader Impact: Classifier Model
integrated by Crisis Mapping Pioneer
58
@hemant_pt
DISASTER Event
Application-2: “We TRUST people!”
User engagement tool
CITIZEN
Sensors
RESPONSE
Organizations
To...
@hemant_pt
Broader Impact: Winner of Int’l Challenge: UN
ITU Young Innovators 2014
60
@hemant_pt
Articulation
ENGAGEMENT MODELING INTENT MINING
COOPERATIVE
SYSTEM
61
ORGANIZATIONS	
   CITIZEN	
  SENSOR	
  COM...
@hemant_pt
Limitations & Future Work
—  Cooperative System
—  CSCW Application specific to domain of crisis
Ø  How to c...
@hemant_pt
Conclusion
Prior knowledge, and
interplay of features of users, their content, and network
efficiently model
In...
@hemant_pt
Thanks to the Committee Members
64
[Left to Right] Prof. Amit Sheth, (advisor, WSU), Prof. Guozhu Dong (WSU), P...
@hemant_pt
Acknowledgement,
Thanks and Questions J
—  NSF SoCS grant IIS-1111182 to support this work
—  Interdisciplin...
@hemant_pt
Ambiguity
Sparsity
Diversity
Scalability
•  Mutual Influence in Sparse
Friendship Network
[AAAI ICWSM’12]
•  Us...
Upcoming SlideShare
Loading in …5
×

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

2,871 views

Published on

Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.

Published in: Engineering

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

  1. 1. Mining Citizen Sensor Communities to Improve Cooperation with Organizational Actors June 23 2015 PhD Defense Hemant Purohit (Advisor: Prof. Amit Sheth)   Kno.e.sis, Dept. of CSE, Wright State University, USA
  2. 2. @hemant_pt Outline —  Citizen Sensor Communities & Organizations —  Cooperative System Design Challenges —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 2
  3. 3. @hemant_pt Citizen Sensors: Access to Human Observations & Interactions Uni-directional communication (TO people) Unstructured, Unconstrained Language Data •  Ambiguity •  Sparsity •  Diversity •  Scalability Bi-directional (BY people, TO people) Web 2.0 media 3
  4. 4. @hemant_pt Goal: Data to Decision Making Organizational Decision Making Noisy Citizen Sensor data 4 SOCIAL SCIENCE •  Experts on Organizations •  Small-scale Data COMPUTER SCIENCE •  Experts on Mining •  Large-scale data Scope of My Research
  5. 5. @hemant_pt 1.  No Structured Roles 2.  No Defined Tasks ü  But “GENERATE” Massive Data 1.  Structured Roles 2.  Defined Tasks ü  COLLECT Data ü  Process, & Make Decisions ORGANIZATIONS   Sure! How to help? CITIZEN  SENSOR  COMMUNITIES   5 COOPERATIVE SYSTEM Can you help us?
  6. 6. @hemant_pt Computer-Supported Cooperative Work (CSCW) Matrix 6 [Johansen 1988, Baecker 1995] TIME PLACE
  7. 7. @hemant_pt Articulation Challenges (Malone & Crowston 1990; Schmidt & Bannon 1992) ENGAGEMENT MODELING INTENT MINING COOPERATIVE SYSTEM DATA PROBLEM DESIGN PROBLEM 7 ORGANIZATIONS   CITIZEN  SENSOR  COMMUNITIES   Awareness Q1. Who to engage first? Org. Actor Q2. What are resource needs & availabilities? Org. Actor
  8. 8. @hemant_pt Research Questions —  Can general theories of offline conversation be applied in the online context? —  Can we model intentions to inform organizational tasks using knowledge-guided features? —  Can we find reliable groups to engage by modeling collective group divergence using content-based measure? 8
  9. 9. @hemant_pt Thesis: Statement Prior knowledge, and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation of citizen sensor communities. Scope of Concepts •  Intent: aim of action, e.g., offering help •  Engagement: involvement in activity, e.g., participating in discussion 9
  10. 10. @hemant_pt Contributions 1.  Operationalized computing in cooperative system design —  by accommodating articulation in Intent Mining, and —  enriching awareness by Engagement Modeling 2.  Improved computation of online social data —  by incorporating features from offline social theoretical knowledge 3.  Improved performance of intent classification —  by fusing top-down & bottom-up data representations 4.  Improved explanation of group engagement —  by modeling content divergence to complement existing structural measures 10
  11. 11. @hemant_pt Data: Scope —  Social Platform: Twitter —  Important bridge between citizens & organizations —  Characteristics —  Users: follow/subscribe —  Content: status updates (140 chars max) —  Network: directed —  Platform conversation functions —  Reply —  Retweet —  Mention 11
  12. 12. @hemant_pt Outline —  Citizen Sensor Communities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 12
  13. 13. @hemant_pt User1. Analyzing #Conversations on Twitter. Using platform provided functions #REPLY, #RT, and #Mention. .. … …….. User2. I kinda feel one might need more than just the platform fn -- @User1 u can think #Psycholinguistics, dude! Problem 1. Conversation Classification —  Function of Reply, Retweet, Mention reflect conversation 13 R1. Can general theories of conversation be applied in the online context?
  14. 14. @hemant_pt Problem 1. Conversation Classification —  Function of Reply, Retweet, Mention reflect conversation —  Task: Given a set S of messages mi, Classify a sample {mi} for {RP, None}, {RT, None}, {MN, None} , where —  Ground-truth corpuses —  RP = { mi | has_Reply_function (mi) = True } —  RT = { mi | has_Retweet_function (mi) = True } —  MN = { mi | has_Mention_function (mi) = True } —  None = S – {RP, RT, MN} —  Sample {mi} size = 3, based on average Reply conversation size 14
  15. 15. @hemant_pt Conversation Classification: Offline Theories —  Psycholinguistics Indicators [Clark & Gibbs, 1986, Chafe 1987, etc.] —  Determiners (‘the’ vs. ‘a/an’) —  Dialogue Management (e.g., ‘thanks’, ’anyway’), etc. —  Drawback —  Offline analysis focused on positive conversation instances —  Hypotheses —  Offline theoretic features are discriminative —  Such features correlate with information density 15
  16. 16. @hemant_pt Conversation Classification: Feature Examples 16 CATEGORY Hj Hj SET H1 - Determiners (the) H3 - Subject pronouns (she, he, we, they) H9 - Dialogue management indicators (thanks, yes, ok, sorry, hi, hello, bye, anyway, how about, so, what do you mean, please, {could, would, should, can, will} followed by pronoun) H11 - Hedge words (kinda, sorta) •  Feature_Hj (mi) = term-frequency ( Hj-set, mi ) •  Normalized •  Total 14 feature categories
  17. 17. @hemant_pt Conversation Classification: Results —  Dataset —  Tweets from 3 Disasters, and 3 Non-Disaster events —  Varying set size (3.8K – 609K), time periods —  Classifier: —  Decision Tree —  Evaluation: 10-fold Cross Validation —  Accuracy: 62% - 78% [Lowest for {Mention,None} ] —  AUC range: 0.63 - 0.84 17  Purohit,  Hampton,  Shalin,  Sheth  &  Flach.  In  Journal  of  Computers  in  Human  Behavior,  2013
  18. 18. @hemant_pt Conversation Classification: Discriminative Features —  Consistent top features across classifiers —  Pronouns (e.g., you, he) —  Dialogue management (e.g., thanks) —  Determiners (e.g., the) —  Word counts —  Positively correlated with RP, RT, MN —  Correlation Coefficient up to 0.69 18
  19. 19. @hemant_pt Conversation Classification: Psycholinguistic Analysis —  LIWC: Tool for deeper content analysis [Pennebaker, 2001] —  Gives a measure per psychological category —  Categories of interest —  Social Interaction —  Sensed Experience —  Communication —  Analyzed output sets in confusion matrices Ø  Higher values for positive classified conversation Ø suggests higher information for cooperative intent 19  Purohit,  Hampton,  Shalin,  Sheth  &  Flach.  In  Journal  of  Computers  in  Human  Behavior,  2013 True Positive False Negative False Positive True Negative
  20. 20. @hemant_pt Conversation Classification: Lessons 1.  Offline theoretic features of conversations exist in the online environment Ø  Can be applied for computing social data 2.  Such features correlate with information density in content - Reflection of conversation for an intent 20
  21. 21. @hemant_pt Outline —  Citizen Sensor Communities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 21
  22. 22. @hemant_pt Thesis: Statement Prior knowledge, and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation of citizen sensor communities. 22
  23. 23. @hemant_pt Short-text Document Intent —  Intent: Aim of action DOCUMENT   INTENT Text  REDCROSS  to  90999  to  donate  10$  to  help  the  victims  of   hurricane  sandy SEEKING HELP Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy OFFERING HELP   Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http:// t.co/LyCSprbk has valuable info! ADVISING   23
  24. 24. @hemant_pt Short-text Document Intent —  Intent: Aim of action DOCUMENT   INTENT Text  REDCROSS  to  90999  to  donate  10$  to  help  the  victims  of   hurricane  sandy SEEKING HELP Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy OFFERING HELP   Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http:// t.co/LyCSprbk has valuable info! ADVISING   24 How to identify relevant intent from ambiguous, unconstrained natural language text? Relevant intent è Articulation of organizational tasks (e.g., Seeking vs. Offering resources)
  25. 25. @hemant_pt Intent Classification: Problem Formulation —  Given a set of user-generated text documents, identify existing intents —  Variety of interpretations —  Problem statement: a multi-class classification task approximate f: S ! C , where C = {c1, c2 … cK} is a set of predefined K intent classes, and S = {m1, m2 … mN} is a set of N short text documents Focus - Cooperation-assistive intent classes, C= {Seeking, Offering, None} 25
  26. 26. @hemant_pt Intent Classification: Related Work TEXT CLASSIFICATION TYPE FOCUS EXAMPLE Topic predominant subject matter sports or entertainment Sentiment/Emotion/ Opinion focus on present state of emotional affairs negative or positive; happy emotion Intent Focus on action, hence, future state of affairs offer to help after floods e.g., I am going to watch the awesome Fast and Furious movie!! #Excited 26
  27. 27. @hemant_pt Intent Classification: Related Work DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY 27 Formal text on Webpages/blogs (Kröll and Strohmaier 2009, -15; Raslan et al. 2013, -14) Knowledge Acquisition: via Rules, Clustering •  Lack of large corpora with proper grammatical structure •  Poor quality text hard to parse for dependencies Commercial Reviews, marketplace (Hollerit et al. 2013, Wu et al. 2011, Ramanand et al. 2010, Carlos & Yalamanchi 2012, Nagarajan et al. 2009) Classification: via Rules, Lexical template based, Pattern •  More generalized intents (e.g., ‘help’ broader than ‘sell’) •  Patterns implicit to capture than for buying/selling Search Queries (Broder 2002, Downey et al. 2008,, Case 2012, Wu et al. 2010, Strohmaier & Kröll 2012) User Profiling: Query Classification •  Lack of large query logs, click graphs •  Existence of social conversation
  28. 28. @hemant_pt Intent Classification: Challenges —  Unconstrained Natural Language in small space —  Ambiguity in interpretation —  Sparsity of low ‘signal-to-noise’: Imbalanced classes —  1% signals (Seeking/Offering) in 4.9 million tweets #Sandy —  Hard-to-predict problem: —  commercial intent, F-1 score 65% on Twitter [Hollerit et al. 2013] @Zuora wants to help @Network4Good with Hurricane Relief. Text SANDY to 80888 & donate $10 to @redcross @AmeriCares & @SalvationArmyUS #help *Blue: offering intent, *Red: seeking intent 28
  29. 29. @hemant_pt Intent Classification: Types & Features 29 Intent Binary Crisis Domain: - [Varga et al. 2013] Problem vs. Aid (Japanese) - Features: Syntactic, Noun-Verb templates, etc. Commercial Domain: - [Hollerit et al. 2013] Buy vs. Sell intent - Features: N-grams, Part-of-Speech Multiclass Commercial Domain: -  Not on Twitter
  30. 30. @hemant_pt TOP-DOWN Pattern Rules: Declarative Knowledge (patterns defined for intent association) BOTTOM-UP Bag of N-grams Tokens: Independent Tokens (patterns derived from the data) Our Hybrid Approach Learning Improves Expressivity Increases 30
  31. 31. @hemant_pt Intent Classification Top-Down: Binary Classifier - Prior Knowledge —  Conceptual Dependency Theory [Schank, 1972] —  Make meaning independent from the actual words in input —  e.g., Class in an Ontology abstracts similar instances —  Verb Lexicon [Hollerit et al. 2013] —  Relevant Levin’s Verb categories [Levin, 1993] —  e.g., give, send, etc. —  Syntactic Pattern —  Auxiliary & modals: e.g., ‘be’, ‘do’, ‘could’, etc. [Ramanand et al. 2010] —  Word order: Verb-Subject positions, etc. Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014   31
  32. 32. @hemant_pt Intent Classification Top-Down: Binary Classifier – Psycholinguistic Rules —  Transform knowledge into rules —  Examples: (Pronouns except 'you' = yes) ^ (need/want = yes) ^ (Adjective = yes/no) ^ (Things=yes) → Seeking (Pronoun except 'you' | Proper Noun = yes) ^ (can/could/would/should = yes) ^ (Levin Verb = yes) ^ (Determiner = yes/no) ^ (Adjective = yes/no) ^ (Things = yes) -> Offering Domain ontology 32 Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014  
  33. 33. @hemant_pt Intent Classification Top-Down: Binary Classifier - Lessons —  Preliminary Study —  2000 conversation and then rule-based classified tweets: labeled by two native speakers —  Labels: Seeking, Offering, None —  Results —  Avg. F-1 score: 78% (Baseline F-1 score: 57% [Varga et al. 2013] ) —  Lessons —  Role of prior knowledge: Domain Independent & Dependent —  Limitation: Exhaustive rule-set, low Recall, Ambiguity addressed, but sparsity                Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014   33
  34. 34. @hemant_pt TOP-DOWN Pattern Rules: Declarative Knowledge BOTTOM-UP Bag of N-grams Tokens: Independent Tokens Hybrid Approach 34
  35. 35. @hemant_pt Intent Classification Hybrid: Binary Classifier - Design —  AMBIGUITY: addressed via rich feature space 1. Top-Down: Declarative Knowledge Patterns [Ramanand et al. 2010] DK(mi, P) ! {0,1} e.g., P= b(like|want) b.*b(to)b.*b(bring|give|help|raise|donate)b (acquired via Red Cross expert searches) 2. Abstraction: due to importance in info sharing [Nagarajan et al. 2010] -  Numeric (e.g., $10) à _NUM_ -  Interactions (e.g., RT & @user) à _RT_ , _MENTION_ -  Links (e.g., http://bit.ly) ! _URL_ 3. Bottom-Up: N-grams after stemming and abstraction [Hollerit et al. 2013] TOKENIZER ( mi ) à { bi-, tri-gram } 35
  36. 36. @hemant_pt Intent Classification Hybrid: Binary Classifier - Design —  SPARSITY: addressed via algorithmic choices 1.  Feature Selection 2.  Ensemble Learning 3.  Classifier Chain 36 DATASET Knowledge-driven features XT , y m_1 m_2 P(c2) P(c1) X1 T, y1 X2 T, y2 1 - P(c1)
  37. 37. @hemant_pt Intent Classification Hybrid: Binary Classifier - Experiments —  Binary classifiers: —  Seeking vs. not Seeking —  Offering vs. not Offering —  Dataset: —  Candidate set: 4000 donation classified tweets —  Labels: min. 3 judges —  Annotations: Seeking , Offering , None 37Purohit,  Castillo,  Diaz,  Sheth,  &  Meier.  First  Monday  journal,  2014  
  38. 38. @hemant_pt Intent Classification Hybrid: Binary Classifier - Results Experiments Supervised Learning Training Samples Precision (*Baseline) F-1 score Class- labels Seeking vs. (None’ + Offering) RF (CR=50:1) 3836 98% (*79%) 46% (56%) 56% requests Offering vs. (None’) RF (CR=9:2) 1763 90% (*65%) 44% (*58%) 13% offers RF = Random Forest ensemble CR = Asymmetric false–alarm Cost Ratios for True:False Evaluation : 10-fold CV Notes: -  Domain requires high precision than recall -  Scope for improving low recall 38Purohit,  Castillo,  Diaz,  Sheth,  &  Meier.  First  Monday  journal,  2014  
  39. 39. @hemant_pt Intent Classification Hybrid: Multiclass Classifier - Generalization —  Lessons from binary classification —  Improvement by fusing top-down & bottom-up —  Sparsity —  Ambiguity (Seeking & Offering complementary) —  addressed via improved data representation Hypothesis: Knowledge-guided approach improves multiclass classification accuracy 39
  40. 40. @hemant_pt TOP-DOWN Knowledge Patterns (DK) Declarative (SK) Social Behavior (CTK, CSK) Contrast Patterns BOTTOM-UP Bag of N-grams Tokens: (T) Independent Tokens Hybrid Approach 40
  41. 41. @hemant_pt Intent Classification Hybrid: Multiclass Classifier – Feature Creation 1. (T) Bag of Tokens - 2. (DK) Declarative Knowledge Patterns —  Domain expert guidance —  Psycholinguistics syntactic & semantic rules —  Expand by WordNet and Levin Verbs e.g., 3. (SK) Social Knowledge Indicators —  Offline conversation indicators studied in Problem 1 e.g., Hj = Dialogue Management, Hj-set = {Thanks, anyway,..} 41 (how = yes) ^ (Modal-Set 'can' = yes) ^ (Pronouns except 'you' = yes) ^ (Levin Verb-Set 'give' = yes) Feature_Hj (mi) = term-frequency ( Hj-set, mi ) Pj = Feature_Pj (mi) = 1 if Pj exists in mi , else 0 TOKENIZER(mi , min, max)
  42. 42. @hemant_pt Intent Classification Hybrid: Multiclass Classifier - Feature Creation 4. (CTK) Contrast Knowledge Patterns INPUT: corpus {mi} cleaned and abstracted, min. support, X For each class Cj —  Find contrasting pattern using sequential pattern mining OUTPUT: contrast patterns set {P} for each class Cj 5. (CPK) Contrast Patterns: on Part-of-Speech tags of {mi} 42 e.g., unique sequential patterns: SEEKING: help .* victim .* _url_ .* OFFERING: anyon .* know .* cloth .*
  43. 43. @hemant_pt Intent Classification Hybrid: Multiclass Classifier - Feature Creation Finding CTK: Contrast Knowledge Patterns For each class Cj 1.  Tokenize the cleaned, abstracted text of {mi } 2.  Mine Sequential Patterns: SPADE Algorithm —  - Output: sequences of token sets, {P’} 3.  Reduce to minimal sequences {P} 4.  Compute growth rate & contrast strength for P with all other Ck 5.  Top-K ranked {P} by contrast strength OUTPUT: contrast patterns set {P} for each class Cj 43 gr(P,Cj,Ck) = support (P,Cj) / support (P,Ck) .. (1) Contrast-Growth (P,Cj,Ck) = 1/(|Cj| -1) ΣCk, k=/=j gr(P,Cj,Ck)/ (1 + gr(P,Cj,Ck)) ..(2) Contrast-Strength(P,Cj) = support(P,Cj)*Contrast-Growth(P,Cj,Ck) .. (3)
  44. 44. @hemant_pt CORPUS Set of short text documents, S FEATURES Knowledge-driven features XT , y M_1 M_2 M_K . . . Subset Xj T ⊂ S such that, Xj T includes all the labeled instances of class Cj for model M_j Binarization Frameworks for Multiclass Classifier: 1 vs. All P(c2) P(c1) X1 T, y1 X2 T, y2 XK T, yK P(cK) 44(In 1 vs. 1 framework: K*(K-1)/2 classifiers, for each Cj,Ck pair)
  45. 45. @hemant_pt Intent Classification Hybrid: Multiclass Classifier - Experiments —  Datasets —  Dataset-1: Hurricane Sandy, Oct 27 – Nov 7, 2012 —  Dataset-2: Philippines Typhoon, Nov 7 – Nov 17, 2013 —  Parameters —  Base Learner M_j: Random Forest, 10 trees with 100 features —  bi-, tri-gram for (T) —  K=100% & min. support 10% for CTK, 50% for CPK 45
  46. 46. @hemant_pt Intent Classification: Multiclass Classifier – Results 46 56% 58% 60% 62% 64% 66% 68% 70% T (Baseline) T,DK T,SK T,CTK,CSK T,DK,SK,CTK,CSK 1-vs-1 1-vs-All Avg. F-1 Score (10-fold CV) Frameworks: Gain 7%, p < 0.05 Dataset-1 (Hurricane Sandy, 2012) (Declarative) (Social) (Contrast)
  47. 47. @hemant_pt 74% 76% 78% 80% 82% 84% 86% T (Baseline) T,DK T,SK T,CTK,CSK T,DK,SK,CTK,CSK 1-vs-1 1-vs-All Intent Classification: Multiclass Classifier - Results 47 Frameworks: Gain 6%, p < 0.05 Dataset-2 (Philippines Typhoon, 2013) (Declarative) (Social) (Contrast) Avg. F-1 Score (10-fold CV)
  48. 48. @hemant_pt Lessons 1.  Top-down & Bottom-up hybrid approach improves data representation for learning (complementary) intent classes —  Top 1% discriminative features contained 50% knowledge driven 2.  Offline theoretic social conversation (SK) features (the, thanks, etc.), often removed for text classification are valuable for intent. 3.  There is a varying effect of knowledge types (SK vs. DK vs. CTK/CPK) in different types of real world event datasets Ø Culturally-sensitive psycholinguistics knowledge in future 48
  49. 49. @hemant_pt Outline —  Citizen Sensor Communities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 49
  50. 50. @hemant_pt Thesis: Statement Prior knowledge, and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation of citizen sensor communities. 50
  51. 51. @hemant_pt —  Engagement: degree of involvement in discussion —  Reliable groups: stay focused and collectively behave to diverge on topics Problem 3. Group Engagement Model 51Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014 How can organizations find reliable groups to engage for action?
  52. 52. @hemant_pt —  Engagement: degree of involvement in discussion —  Reliable groups: stay focused and collectively behave to diverge on topics —  Why & How do groups collectively evolve over time? 1.  Define a group from interaction network, g 2.  Define Divergence of g: content based in contrast to structure 3.  Predict change in the divergence between time slices —  Features of g based on theories of social identity, & cohesion Problem 3. Group Engagement Model 52Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014
  53. 53. @hemant_pt Group Engagement Model: Integrated Approach Unlike Prior Work People (User): Participant of the discussion Content (Text): Topic of Interest Network (Community): Group around topic AND AND Sources: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html KEY POINT: capture User Node Diversity 53
  54. 54. @hemant_pt —  Candidate Group: Detect in interaction network —  Group Discussion Divergence: Jenson-Shannon Divergence of topic distribution on group members’ tweets Group Engagement Model: Discussion Divergence where, H(*) = Shannon Entropy Bt = Latent topic distribution of each tweet t in all members’ tweets |Tg| , Bg = mean topic distribution of group g, such that: 54
  55. 55. @hemant_pt Lessons 1.  Content Divergence based measure helps explanation of why groups collectively diverge —  Less diverging group write more social & future action related content 2.  Emerging events such as disasters have higher correlation with social identity-driven features Ø Role of social context 55
  56. 56. @hemant_pt Outline —  Citizen Sensor Communities & Organizations —  Cooperative System Design Challenges —  Awareness: tackle via Engagement Modeling —  Articulation: tackle via Intent Mining —  Contributions —  Problem 1. Conversation Classification using Offline Theories —  Problem 2. Intent Classification —  Problem 3. Engagement Modeling —  Applications —  Limitations & Future Work 56
  57. 57. @hemant_pt DISASTER Event Application-1: Filter Content for Disaster Response CITIZEN Sensors RESPONSE Organizations Me  and  @CeceVancePR  are  coordinating  a  clothing/ food  drive  for  families  affected  by  Hurricane  Sandy.   If  you  would  like  to  donate,  DM  us       Does  anyone  know  how  to  donate  clothes  to   hurricane  #Sandy  victims?   [SEEKING   [OFFERING   Intent-Classifiers as a Service 57
  58. 58. @hemant_pt Broader Impact: Classifier Model integrated by Crisis Mapping Pioneer 58
  59. 59. @hemant_pt DISASTER Event Application-2: “We TRUST people!” User engagement tool CITIZEN Sensors RESPONSE Organizations Tool to mine Important users 59
  60. 60. @hemant_pt Broader Impact: Winner of Int’l Challenge: UN ITU Young Innovators 2014 60
  61. 61. @hemant_pt Articulation ENGAGEMENT MODELING INTENT MINING COOPERATIVE SYSTEM 61 ORGANIZATIONS   CITIZEN  SENSOR  COMMUNITIES   Awareness Q1. Who to engage first? Org. Actor Q2. What are Resource needs & availabilities? Org. Actor
  62. 62. @hemant_pt Limitations & Future Work —  Cooperative System —  CSCW Application specific to domain of crisis Ø  How to create a full What-Where-When-Who knowledge base —  Intent Mining —  Non-cooperation assistive intent classes not considered, as well as the temporal drift of intent not considered Ø  How to mine actor-level intent beyond document level —  Group Engagement —  Reliable prioritized groups based on Correlation, not Causality —  Interplay of Offline and Online interactions beyond the scope Ø  How to incorporate intent in the group divergence —  Bipartite Intent Graph Matching —  Reducing time complexity of Seeking vs. Offering matching 62
  63. 63. @hemant_pt Conclusion Prior knowledge, and interplay of features of users, their content, and network efficiently model Intent & Engagement for cooperation between citizen sensors and organizations in the online social communities. 63
  64. 64. @hemant_pt Thanks to the Committee Members 64 [Left to Right] Prof. Amit Sheth, (advisor, WSU), Prof. Guozhu Dong (WSU), Prof. Srinivasan Parthasarathy (OSU), Prof. TK Prasad (WSU), Dr. Patrick Meier (QCRI), Prof. Valerie Shalin (WSU) Computer Science Social Science
  65. 65. @hemant_pt Acknowledgement, Thanks and Questions J —  NSF SoCS grant IIS-1111182 to support this work —  Interdisciplinary Mentors especially Prof. John Flach (WSU), Drs. Carlos Castillo (QCRI), Fernando Diaz (Microsoft), Meena Nagarajan (IBM) —  Kno.e.sis team especially Andrew Hampton from Psychology dept. and Shreyansh and Tanvi from CSE at Wright State, as well as Yiye Ruan (now Google) & David Fuhry at the Data Mining Lab, Ohio State University —  Colleagues: Digital Volunteers from the CrisisMappers network, StandBy Task Force, InCrisisRelief.org, info4Disasters, Humanity Road, Ushahidi, etc. and the subject matter experts at UN FPA 65
  66. 66. @hemant_pt Ambiguity Sparsity Diversity Scalability •  Mutual Influence in Sparse Friendship Network [AAAI ICWSM’12] •  User Summarization with Sparse Profile Metadata [ASE SocialInfo’12] •  Matching intent as task of Information Retrieval [FM’14] •  Knowledge-aware Bi-partite Matching [In preparation] •  Short-Text Document Intent Mining [FM’14, JCSCW’14] •  Actor-Intent Mining Complexity [In preparation] •  Modeling Group Using Diverse Social Identity & Cohesion [AAAI ICWSM’14] •  Modeling Diverse User- Engagement [SOME WWW’11, ACM WebSci’12] (Interpretation) (users) (behaviors) 66 Other works

×