SlideShare a Scribd company logo
Who will RT this?
Automatically Identifying and Engaging Strangers
on Twitter to Spread Information

Kyumin Lee, Jalal Mahmud, Jilin Chen, Michelle X. Zhou, Jeffrey Nichols
Utah State University & IBM Research – Almaden
jwnichols@us.ibm.com
@jwnichls

@jwnichls

1
Public Social Media Contains a Wealth
of Information about Individuals…

@jwnichls

2
Public Social Media Contains a Wealth
of Information about Individuals…
Can we harness this information
for something useful?
•
•

Collect Information

•

@jwnichls

Identify people to recruit to do various tasks
Spread Information

3
Information Collection: TSA Tracker
Crowdsourcing airport security wait times through Twitter

Step #1. Watch for people
tweeting about being
in airport

http://tsatracker.org/
@tsatracker, @tsatracking

Step #2. Ask nicely if they
would share wait time to
help others
Step #3. Collect responses
and share relevant data
on web site

Step #4. Say thank you!

@jwnichls

Key Result:
Would people respond to questions
from strangers? Yes…around 40% of
the time
[Nichols et al. CSCW 2012, Mahmud et al. IUI 2013]

4
Today: Information Spreading

• Relevant marketing campaign messages
• Alerts and SOS messages in an emergency
• Etc.
@jwnichls

5
Today: Information Spreading

Challenge: Low percentage of people respond to this task
• Can we predict who will retweet and direct requests only to them?
• Can we predict who will retweet more quickly?
@jwnichls

6
Our Process

@jwnichls

1. Data Collection
2. Feature
Extraction
3. Feature Selection
4. Model Building
5. Evaluation

7
Ground-Truth Data Collection
Public Safety (location-based)

Bird Flu (topic-based)

•

•

•
•
•

Randomly selected users who
tweeted from the San Francisco bay
area (via geo-tags)
Contacted 1,902 users
52 (2.8%) retweeted our message
Message reached a total of 18,670
followers

@jwnichls

•
•
•

Randomly selected users who posted
one of the following words in at least
one tweet: “bird flu”, “H5N1” and
“avian influenza”
Contacted 1,859 users
155 (8.4%) retweeted our message
Message reached a total of 184,325
followers

8
In total:

Resulting Data

• Contacted 3,761 strangers
• 207 positive examples, 3554
negative examples

For each user we contacted,
we collected:
• Twitter profile (screen name,
tweet count, etc.)
• People they followed,
followers,
• Up to 200 recent messages
• Ground truth (“retweeter” or
“non-retweeter”)

@jwnichls

9
Feature Extraction
Feature Categories
•
•
•
•
•
•

Profile Features
Social Network Features
Personality Features
Activity Features
Past Retweeting Features
Readiness Features

@jwnichls

10
Feature Extraction
Feature Categories
•
•
•
•
•
•

Profile Features
Social Network Features
Personality Features
Activity Features
Past Retweeting Features
Readiness Features

@jwnichls

Profile Features
• longevity (age) of an account
• length of screen name
• whether the user profile has a
description
• length of the description
• whether the user profile has a
URL

Social Network Features
• number of users following
(friends)
• number of followers
• and the ratio of number of
friends to number of followers
11
Feature Extraction
Feature Categories
•
•
•
•
•
•

Profile Features
Social Network Features
Personality Features
Activity Features
Past Retweeting Features
Readiness Features

@jwnichls

Users’ word usage has been found to predict
their personality
•Linguistic Inquiry and Word Count (LIWC)
dictionary
•Personality features derived from LIWC
categories [Yarkoni 2010, Mahmud 2013]

12
Feature Extraction
Feature Categories
•
•
•
•
•
•

Profile Features
Social Network Features
Personality Features
Activity Features
Past Retweeting Features
Readiness Features

•
•
•
•
•
•
•
•
•
•

@jwnichls

Number of status messages
Number of direct mentions (e.g., @johny)
per status message
Number of URLs per status message
Number of hashtags per status message
Number of status messages per day
during her entire
Account life (= total number of posted
status messages / longevity)
Number of status messages per day
during last one month
Number of direct mentions per day during
last one month
Number of URLs per day during last one
month
Number of hashtags per day during last
one month

13
Feature Extraction
Feature Categories
•
•
•
•
•
•

Profile Features
Social Network Features
Personality Features
Activity Features
Past Retweeting Features
Readiness Features

@jwnichls

Past Retweeting Behavior
•
•
•

Number of retweets per status
message: R/N
Average number of retweets per day
Fraction of retweets for which original
messages are posted by strangers who
are not in her social network

Readiness Based on Previous Activity
•
•
•
•
•
•

Tweeting Likelihood (Day)
Tweeting Likelihood (Hour)
Entropy of Tweeting Likelihood (Day)
Entropy of Tweeting Likelihood (Hour)
Tweeting Steadiness
Tweeting Inactivity

14
Training and Test Sets:

Predicting
Retweeters

• Each dataset (public safety
and bird flu) was randomly
split to training set (2/3 data)
and testing set (1/3 data)

5 Predictive Models
• Random Forest, Naïve Bayes,
Logistic Regression, SMO
(SVM) and AdaboostM1

Handing Class Imbalance
• Used both over-sampling
(SMOTE) and weighting
approaches (cost-sensitive
approach)
@jwnichls

15
Feature Selection
Computed χ2 value for each feature in training
Feature
Group
Profile
Social-network

Significant Features (bolded is common to both
data sets)
the longevity of the account

Feature
Group

the length of description

Profile

has description in profile

|following|
ratio of number of friends to number of followers

|URLs| per day
|direct mentions| per day

|URLs| per day
|direct mentions| per day
Activity

|hashtags| per day

Activity

|URLs| per status message

|hashtags| per day

|direct mentions| per status message

|status messages|

|hashtags| per status message

|status messages| per day during entire account life
|status messages| per day during last one month
Past
Retweeting
Readiness

Personality

Significant Features (bolded is common to both data
sets)

Past
Retweeting

|retweets| per status message
|retweets| per day

Readiness

Tweeting Likelihood of the Day
Tweeting Likelihood of the Day (Entropy)
7 LIWC features: Inclusive, Achievement, Humans,
Time, Sadness, Articles, Nonfluencies
1 Facet feature: Modesty

21 Features Selected by χ2 in Publish Safety Dataset

Personality

|retweets| per status message
|retweets| per day
|URLs| per retweet message
Tweeting Likelihood of the Hour (Entropy)
34 LIWC features: Inclusive, Total Pronouns, 1st Person
Plural, 2nd Person, 3rd Person, Social Processes, Positive
Emotions, Numbers, Other References, Occupation, Affect,
School, Anxiety, Hearing, Certainty, SZensory Processes,
Death, Body States, Positive Feelings, Leisure, Optimism,
Negation, Physical States, Communication
8 Facet features: Liberalism, Assertiveness, Achievement
Striving, Self-Discipline, Gregariousness, Cheerfulness,
Activity Level, Intellect
2 Big5 features: Conscientiousness, Openness

46 Features Selected by χ2 in Bird Flu Dataset

Activity, personality, readiness and past retweeting feature groups have more significant power.
Six significant features (bolded names) are common to both sets.
@jwnichls

16
Evaluating Retweeter Prediction
Only the significant features are used for prediction
Classifier

AUC

F1

F1 of
Retweeter

0.638

0.958

0

Naïve Bayes

0.619

0.939

0.172

Logistic

0.640

0.958

0

SMO

0.500

0.96

0

AdaBoostM1

0.548

0.962

0.1

0.606

0.916

0.119

Naïve Bayes

0.637

0.923

0.132

Logistic

0.664

0.833

0.091

SMO

0.626

0.813

0.091

AdaBoostM1

0.633

0.933

0.129

Cost-Sensitive (Weighting, showing the best results in each model)
Random Forest
Naïve Bayes
Logistic
SMO
AdaBoostM1

0.692
0.619
0.623
0.633
0.678

F1 of
Retweeter

Random Forest

0.707

0.877

0.066

Naïve Bayes

0.670

0.834

0.222

Logistic

0.751

0.878

0.067

SMO

0.500

0.876

0

AdaBoostM1

0.627

0.878

0.067

SMOTE

SMOTE
Random Forest

F1

AUC
Basic

Basic
Random Forest

Classifier

0.954
0.93
0.938
0.892
0.956

Random Forest

0.707

0.819

0.236

Naïve Bayes

0.679

0.724

0.231

Logistic

0.76

0.733

0.258

SMO

0.729

0.712

0.278

AdaBoostM1

0.709

0.837

0.292

Cost-Sensitive (Weighting, showing the best results in each model)
Random Forest

0.785

0.815

0.296

0.147

Naïve Bayes

0.670

0.767

0.24

0.042
0.123
0.133

Logistic

0.735

0.742

0.243

SMO

0.676

0.738

0.256

AdaBoostM1

0.669

0.87

0.031

0.125

Prediction accuracy (Public Safety)

Prediction accuracy (Bird Flu)

We use Random Forest for all following experiments.
@jwnichls

17
Comparison with Two Baselines
Baselines
Random people contact
• Randomly select and
ask a sub-set of
qualified candidates

Popular people contact
• Sort candidates in our
test set by their
follower count in the
descending order

@jwnichls

Retweeting Rate in Testing Set

Approach

Public Safety

Bird flu

Random People Contact

2.6%

8.3%

Popular People Contact

3.1%

8.5%

Our Prediction Approach

13.3%

19.7%

Comparison of retweeting rates

18
Incorporating Time Constraints
•
•

Can we predict when a person will retweet given a request?
We use an exponential distribution model to estimate a user’s retweeting
wait time with a probability.
•

•

Assume that retweeting events follow a poisson process.

1/λ = the average wait time for a user based on prior retweeting wait time
When a person’s average wait time 1/λ = 180 min,
retweeting probability within 200 minutes is
larger than 0.6

@jwnichls

19
Evaluation with Time Constraints
•
•
•

Compare two baselines with our prediction approach and our prediction approach
with wait-time model
For the wait-time model, if a user has at least 70% probability to retweet within a
certain time window (say, 6 hours), we will contact the user.
We experimented with different time windows such as 6, 12, 18 or 24 hours. The
below table shows the averaged retweeting rate

Approach
Random People Contact
Popular People Contact
Our Prediction Approach
Our Prediction Approach + WaitTime Model

Average Retweeting Rate in Testing Set
under Time Constraints
Public Safety
2.2%
2.7%
13.3%

Bird flu
6.5%
6.4%
13.6%

19.3%

14.7%

Comparison of retweeting rates with time constraints

@jwnichls

20
Evaluating in Terms of Benefit & Cost
Benefit: Number of people exposed to message
Cost (N): the number of people contacted
K: the number of retweeters
(normalized benefit)

Unit-Info-Reach-Per-Person
Public Safety
Bird flu

Approach
Random People Contact

6

85

Popular People Contact

11

116

Our Prediction Approach

106

135

Our Prediction Approach +
Wait-Time Model

153

155

Comparison of information reach

@jwnichls

21
Live Experiment
•
•
•

To validate the effectiveness of our approach in a live setting, we used our
recommender system to test our approach against the two baselines
Randomly selected 426 candidates who had recently tweeted about “bird
flu” in July 2013
Each approach selected top 100 candidates based on its criteria

Approach

Retweeting Rate

Approach

Average
Retweeting Rate

Random People Contact

4%

Random People Contact

4%

Popular People Contact

9%

Popular People Contact

8.7%

Our Prediction Approach

19%

Our Prediction Approach
Our Prediction Approach + Wait
time model

18%

Comparison of retweeting rates in live experiment

@jwnichls

18.5%

Comparison of retweeting rates in live experiment
with time constraints

22
To wrap up…
•

•

We have also described a time estimation model

•

In the experiments, our approaches doubled the
retweeting rates over the two baselines

•

With our time estimation model, our approach
outperformed other approaches significantly

•

@jwnichls

We have presented a feature-based prediction
model that can automatically identify the right
individuals at the right time on Twitter

Overall, our approach effectively identifies qualified
candidates for retweeting a message within a given
time window
23
Thanks!
For more information, contact:
Jeffrey Nichols
jwnichols@us.ibm.com
@jwnichls
@jwnichls

24
@jwnichls

25
Why Retweet a Stranger’s Request?
We randomly selected 50 people who retweeted and asked
them why they chose to retweet (33 replied)
Main reasons to retweet our requested message
• Trustworthiness of the content
“Because it contained a link to a significant report from a reputable media news
source”

• Content relevance
“Because it happened in my neighborhood”

• Content value
“my followers should know this or they may think this info is valuable”

@jwnichls

26
Real-Time Retweeter Recommendation

The interface of our retweeter recommendation system: (a) left panel:
system-recommended candidates, and (b) right panel: a user can edit
and compose a retweeting request.

@jwnichls

27

More Related Content

Similar to Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
GUANGYUAN PIAO
 
Saner17 sharma
Saner17 sharmaSaner17 sharma
Saner17 sharma
Abhishek Sharma
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Turi, Inc.
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
Sharjeel Imtiaz
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
Artificial Intelligence Institute at UofSC
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
Tokyo Tech
 
Social Relation Based Scalable Semantic Search Refinement
Social Relation Based Scalable Semantic Search RefinementSocial Relation Based Scalable Semantic Search Refinement
Social Relation Based Scalable Semantic Search Refinement
Yi Zeng
 
System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...
Michelle Zhou
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
Dasha Herrmannova
 
Revisiting Plato's Cave, Bruce Heterick, JSTOR
Revisiting Plato's Cave, Bruce Heterick, JSTORRevisiting Plato's Cave, Bruce Heterick, JSTOR
Revisiting Plato's Cave, Bruce Heterick, JSTOR
Charleston Conference
 
microposts2015presentation-150518124457-lva1-app6892.pdf
microposts2015presentation-150518124457-lva1-app6892.pdfmicroposts2015presentation-150518124457-lva1-app6892.pdf
microposts2015presentation-150518124457-lva1-app6892.pdf
SunnySam26
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitter
azubiaga
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
Pavan Kapanipathi
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
Kunwoo Park
 
Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...
Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...
Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...
University of Groningen (The Netherlands)
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
Nikolaos Aletras
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
Symeon Papadopoulos
 
Serving predictive models with Redis
Serving predictive models with RedisServing predictive models with Redis
Serving predictive models with Redis
Tague Griffith
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
Marieke van Erp
 
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
multimediaeval
 

Similar to Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information (20)

ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
 
Saner17 sharma
Saner17 sharmaSaner17 sharma
Saner17 sharma
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
 
Social Relation Based Scalable Semantic Search Refinement
Social Relation Based Scalable Semantic Search RefinementSocial Relation Based Scalable Semantic Search Refinement
Social Relation Based Scalable Semantic Search Refinement
 
System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Revisiting Plato's Cave, Bruce Heterick, JSTOR
Revisiting Plato's Cave, Bruce Heterick, JSTORRevisiting Plato's Cave, Bruce Heterick, JSTOR
Revisiting Plato's Cave, Bruce Heterick, JSTOR
 
microposts2015presentation-150518124457-lva1-app6892.pdf
microposts2015presentation-150518124457-lva1-app6892.pdfmicroposts2015presentation-150518124457-lva1-app6892.pdf
microposts2015presentation-150518124457-lva1-app6892.pdf
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitter
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...
Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...
Learning in the wild: Predicting the formation of ties in 'Ask' subreddit com...
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
Serving predictive models with Redis
Serving predictive models with RedisServing predictive models with Redis
Serving predictive models with Redis
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 

Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

  • 1. Who will RT this? Automatically Identifying and Engaging Strangers on Twitter to Spread Information Kyumin Lee, Jalal Mahmud, Jilin Chen, Michelle X. Zhou, Jeffrey Nichols Utah State University & IBM Research – Almaden jwnichols@us.ibm.com @jwnichls @jwnichls 1
  • 2. Public Social Media Contains a Wealth of Information about Individuals… @jwnichls 2
  • 3. Public Social Media Contains a Wealth of Information about Individuals… Can we harness this information for something useful? • • Collect Information • @jwnichls Identify people to recruit to do various tasks Spread Information 3
  • 4. Information Collection: TSA Tracker Crowdsourcing airport security wait times through Twitter Step #1. Watch for people tweeting about being in airport http://tsatracker.org/ @tsatracker, @tsatracking Step #2. Ask nicely if they would share wait time to help others Step #3. Collect responses and share relevant data on web site Step #4. Say thank you! @jwnichls Key Result: Would people respond to questions from strangers? Yes…around 40% of the time [Nichols et al. CSCW 2012, Mahmud et al. IUI 2013] 4
  • 5. Today: Information Spreading • Relevant marketing campaign messages • Alerts and SOS messages in an emergency • Etc. @jwnichls 5
  • 6. Today: Information Spreading Challenge: Low percentage of people respond to this task • Can we predict who will retweet and direct requests only to them? • Can we predict who will retweet more quickly? @jwnichls 6
  • 7. Our Process @jwnichls 1. Data Collection 2. Feature Extraction 3. Feature Selection 4. Model Building 5. Evaluation 7
  • 8. Ground-Truth Data Collection Public Safety (location-based) Bird Flu (topic-based) • • • • • Randomly selected users who tweeted from the San Francisco bay area (via geo-tags) Contacted 1,902 users 52 (2.8%) retweeted our message Message reached a total of 18,670 followers @jwnichls • • • Randomly selected users who posted one of the following words in at least one tweet: “bird flu”, “H5N1” and “avian influenza” Contacted 1,859 users 155 (8.4%) retweeted our message Message reached a total of 184,325 followers 8
  • 9. In total: Resulting Data • Contacted 3,761 strangers • 207 positive examples, 3554 negative examples For each user we contacted, we collected: • Twitter profile (screen name, tweet count, etc.) • People they followed, followers, • Up to 200 recent messages • Ground truth (“retweeter” or “non-retweeter”) @jwnichls 9
  • 10. Feature Extraction Feature Categories • • • • • • Profile Features Social Network Features Personality Features Activity Features Past Retweeting Features Readiness Features @jwnichls 10
  • 11. Feature Extraction Feature Categories • • • • • • Profile Features Social Network Features Personality Features Activity Features Past Retweeting Features Readiness Features @jwnichls Profile Features • longevity (age) of an account • length of screen name • whether the user profile has a description • length of the description • whether the user profile has a URL Social Network Features • number of users following (friends) • number of followers • and the ratio of number of friends to number of followers 11
  • 12. Feature Extraction Feature Categories • • • • • • Profile Features Social Network Features Personality Features Activity Features Past Retweeting Features Readiness Features @jwnichls Users’ word usage has been found to predict their personality •Linguistic Inquiry and Word Count (LIWC) dictionary •Personality features derived from LIWC categories [Yarkoni 2010, Mahmud 2013] 12
  • 13. Feature Extraction Feature Categories • • • • • • Profile Features Social Network Features Personality Features Activity Features Past Retweeting Features Readiness Features • • • • • • • • • • @jwnichls Number of status messages Number of direct mentions (e.g., @johny) per status message Number of URLs per status message Number of hashtags per status message Number of status messages per day during her entire Account life (= total number of posted status messages / longevity) Number of status messages per day during last one month Number of direct mentions per day during last one month Number of URLs per day during last one month Number of hashtags per day during last one month 13
  • 14. Feature Extraction Feature Categories • • • • • • Profile Features Social Network Features Personality Features Activity Features Past Retweeting Features Readiness Features @jwnichls Past Retweeting Behavior • • • Number of retweets per status message: R/N Average number of retweets per day Fraction of retweets for which original messages are posted by strangers who are not in her social network Readiness Based on Previous Activity • • • • • • Tweeting Likelihood (Day) Tweeting Likelihood (Hour) Entropy of Tweeting Likelihood (Day) Entropy of Tweeting Likelihood (Hour) Tweeting Steadiness Tweeting Inactivity 14
  • 15. Training and Test Sets: Predicting Retweeters • Each dataset (public safety and bird flu) was randomly split to training set (2/3 data) and testing set (1/3 data) 5 Predictive Models • Random Forest, Naïve Bayes, Logistic Regression, SMO (SVM) and AdaboostM1 Handing Class Imbalance • Used both over-sampling (SMOTE) and weighting approaches (cost-sensitive approach) @jwnichls 15
  • 16. Feature Selection Computed χ2 value for each feature in training Feature Group Profile Social-network Significant Features (bolded is common to both data sets) the longevity of the account Feature Group the length of description Profile has description in profile |following| ratio of number of friends to number of followers |URLs| per day |direct mentions| per day |URLs| per day |direct mentions| per day Activity |hashtags| per day Activity |URLs| per status message |hashtags| per day |direct mentions| per status message |status messages| |hashtags| per status message |status messages| per day during entire account life |status messages| per day during last one month Past Retweeting Readiness Personality Significant Features (bolded is common to both data sets) Past Retweeting |retweets| per status message |retweets| per day Readiness Tweeting Likelihood of the Day Tweeting Likelihood of the Day (Entropy) 7 LIWC features: Inclusive, Achievement, Humans, Time, Sadness, Articles, Nonfluencies 1 Facet feature: Modesty 21 Features Selected by χ2 in Publish Safety Dataset Personality |retweets| per status message |retweets| per day |URLs| per retweet message Tweeting Likelihood of the Hour (Entropy) 34 LIWC features: Inclusive, Total Pronouns, 1st Person Plural, 2nd Person, 3rd Person, Social Processes, Positive Emotions, Numbers, Other References, Occupation, Affect, School, Anxiety, Hearing, Certainty, SZensory Processes, Death, Body States, Positive Feelings, Leisure, Optimism, Negation, Physical States, Communication 8 Facet features: Liberalism, Assertiveness, Achievement Striving, Self-Discipline, Gregariousness, Cheerfulness, Activity Level, Intellect 2 Big5 features: Conscientiousness, Openness 46 Features Selected by χ2 in Bird Flu Dataset Activity, personality, readiness and past retweeting feature groups have more significant power. Six significant features (bolded names) are common to both sets. @jwnichls 16
  • 17. Evaluating Retweeter Prediction Only the significant features are used for prediction Classifier AUC F1 F1 of Retweeter 0.638 0.958 0 Naïve Bayes 0.619 0.939 0.172 Logistic 0.640 0.958 0 SMO 0.500 0.96 0 AdaBoostM1 0.548 0.962 0.1 0.606 0.916 0.119 Naïve Bayes 0.637 0.923 0.132 Logistic 0.664 0.833 0.091 SMO 0.626 0.813 0.091 AdaBoostM1 0.633 0.933 0.129 Cost-Sensitive (Weighting, showing the best results in each model) Random Forest Naïve Bayes Logistic SMO AdaBoostM1 0.692 0.619 0.623 0.633 0.678 F1 of Retweeter Random Forest 0.707 0.877 0.066 Naïve Bayes 0.670 0.834 0.222 Logistic 0.751 0.878 0.067 SMO 0.500 0.876 0 AdaBoostM1 0.627 0.878 0.067 SMOTE SMOTE Random Forest F1 AUC Basic Basic Random Forest Classifier 0.954 0.93 0.938 0.892 0.956 Random Forest 0.707 0.819 0.236 Naïve Bayes 0.679 0.724 0.231 Logistic 0.76 0.733 0.258 SMO 0.729 0.712 0.278 AdaBoostM1 0.709 0.837 0.292 Cost-Sensitive (Weighting, showing the best results in each model) Random Forest 0.785 0.815 0.296 0.147 Naïve Bayes 0.670 0.767 0.24 0.042 0.123 0.133 Logistic 0.735 0.742 0.243 SMO 0.676 0.738 0.256 AdaBoostM1 0.669 0.87 0.031 0.125 Prediction accuracy (Public Safety) Prediction accuracy (Bird Flu) We use Random Forest for all following experiments. @jwnichls 17
  • 18. Comparison with Two Baselines Baselines Random people contact • Randomly select and ask a sub-set of qualified candidates Popular people contact • Sort candidates in our test set by their follower count in the descending order @jwnichls Retweeting Rate in Testing Set Approach Public Safety Bird flu Random People Contact 2.6% 8.3% Popular People Contact 3.1% 8.5% Our Prediction Approach 13.3% 19.7% Comparison of retweeting rates 18
  • 19. Incorporating Time Constraints • • Can we predict when a person will retweet given a request? We use an exponential distribution model to estimate a user’s retweeting wait time with a probability. • • Assume that retweeting events follow a poisson process. 1/λ = the average wait time for a user based on prior retweeting wait time When a person’s average wait time 1/λ = 180 min, retweeting probability within 200 minutes is larger than 0.6 @jwnichls 19
  • 20. Evaluation with Time Constraints • • • Compare two baselines with our prediction approach and our prediction approach with wait-time model For the wait-time model, if a user has at least 70% probability to retweet within a certain time window (say, 6 hours), we will contact the user. We experimented with different time windows such as 6, 12, 18 or 24 hours. The below table shows the averaged retweeting rate Approach Random People Contact Popular People Contact Our Prediction Approach Our Prediction Approach + WaitTime Model Average Retweeting Rate in Testing Set under Time Constraints Public Safety 2.2% 2.7% 13.3% Bird flu 6.5% 6.4% 13.6% 19.3% 14.7% Comparison of retweeting rates with time constraints @jwnichls 20
  • 21. Evaluating in Terms of Benefit & Cost Benefit: Number of people exposed to message Cost (N): the number of people contacted K: the number of retweeters (normalized benefit) Unit-Info-Reach-Per-Person Public Safety Bird flu Approach Random People Contact 6 85 Popular People Contact 11 116 Our Prediction Approach 106 135 Our Prediction Approach + Wait-Time Model 153 155 Comparison of information reach @jwnichls 21
  • 22. Live Experiment • • • To validate the effectiveness of our approach in a live setting, we used our recommender system to test our approach against the two baselines Randomly selected 426 candidates who had recently tweeted about “bird flu” in July 2013 Each approach selected top 100 candidates based on its criteria Approach Retweeting Rate Approach Average Retweeting Rate Random People Contact 4% Random People Contact 4% Popular People Contact 9% Popular People Contact 8.7% Our Prediction Approach 19% Our Prediction Approach Our Prediction Approach + Wait time model 18% Comparison of retweeting rates in live experiment @jwnichls 18.5% Comparison of retweeting rates in live experiment with time constraints 22
  • 23. To wrap up… • • We have also described a time estimation model • In the experiments, our approaches doubled the retweeting rates over the two baselines • With our time estimation model, our approach outperformed other approaches significantly • @jwnichls We have presented a feature-based prediction model that can automatically identify the right individuals at the right time on Twitter Overall, our approach effectively identifies qualified candidates for retweeting a message within a given time window 23
  • 24. Thanks! For more information, contact: Jeffrey Nichols jwnichols@us.ibm.com @jwnichls @jwnichls 24
  • 26. Why Retweet a Stranger’s Request? We randomly selected 50 people who retweeted and asked them why they chose to retweet (33 replied) Main reasons to retweet our requested message • Trustworthiness of the content “Because it contained a link to a significant report from a reputable media news source” • Content relevance “Because it happened in my neighborhood” • Content value “my followers should know this or they may think this info is valuable” @jwnichls 26
  • 27. Real-Time Retweeter Recommendation The interface of our retweeter recommendation system: (a) left panel: system-recommended candidates, and (b) right panel: a user can edit and compose a retweeting request. @jwnichls 27