In this paper, using a large amount of data collected from Twitter, the blogosphere, social networks, and news sources, we perform preliminary research to investigate if human behavior in the real world can be understood by analyzing social media data. The goals of this research is twofold: (1) determining the relative effectiveness of a social media lens in analyzing and predicting real-world collective behavior, and (2) exploring the domains and situations under which social media can be a predictor for real-world's behavior. We develop a four-step model: community selection, data collection, online behavior analysis, and behavior prediction. The results of this study show that in most cases social media is a good tool for estimating attitudes and further research is needed for predicting social behavior.
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Real-World Behavior Analysis through a Social Media Lens
1. Real-World Behavior Analysis
through a Social Media Lens
Mohammad-Ali Abbasi, Huan Liu
Computer Science and Engineering, Arizona State University
Sun-Ki Chai, Kiran Sagoo
Department of Sociology, University of Hawai`i
Ali2@asu.edu
Data Mining and
Machine Learning Lab
2. Real-World Behavior Analysis
through a Social Media Lens
Real world Events/Behavior
Data Mining and
Machine Learning Lab
2
6. Any correlation between social media numbers and
election results?
Mitt Romney Ron Paul Newt Gingrich Rick Santorum Barack Obama
1,520,000 900,000 295,000 173,000 25,500,000
370,000 260,000 1,447,000 160,000 12,920,000
Do we observe the same Number of States carried?
difference in the votes?
Data Mining and
Machine Learning Lab
http://en.wikipedia.org/wiki/Republican_Party_presidential_primaries,_2012 6
7. Objectives of the research
• Studying the correlation between real-world
collective behavior and social media data
• Determining the relative effectiveness of a social
media lens in analyzing and predicting real-world
collective behavior
• Exploring the domains and situations under which
social media can be a predictor for real-world's
behavior
Data Mining and
Machine Learning Lab
7
8. Data collection
Active methods
• Expensive
• Experiments
• Social Media consuming
Surveys
• Time
• Maybe dangerous
• Field Study
• People leave many clues about themselves
• Their interactions reveal much about people
Passive methods
• We can passively observe people’s activities
(By observing and analyzing)
• Behavior
• Belongings
• Documents, …
Data Mining and
Machine Learning Lab
8
9. Snooping
Experimental psychology suggests that a person
may be understood by what happens around him
• Does what's on your desk reveal what's on your
mind?
• Do those pictures on your walls tell true tales
about your character?
Data Mining and
Machine Learning Lab
9
10. Using online data for opinion polling
• From Tweets to Polls: Linking Text
Sentiment to Public Opinion Time Series
• O'Connor et al. analyzed sentiment polarity
of tweets and found a correlation of 80% with
results from public opinion polls
Data Mining and
Machine Learning Lab
10
11. Some Existing Work
• Stock Market Prediction using data collected
data form twitter
• Box-office revenues prediction for movies
• Analyzing Arab-Spring using social media
Most of the work in the field can be classified into two categories:
• Behavior Analysis and finding a correlation
• Behavior prediction
Data Mining and
Machine Learning Lab
11
12. Our approach: A four-step model
Find equivalent groups in Real-World & Social Media
Collect Related Online Data from Social Media
Analyze Online Data (Behavior)
Analyze the Real-World Behavior & find correlation
Data Mining and
Machine Learning Lab
12
13. Experimental settings
• Select based on more stable
Find a Group in real • Twitter to collect 35 million tweets related
characteristics
world and Social Media to Race, religion, primary language, and
Arab Spring
• Collect more than origin
country/region of
1 million blogposts
Collect Related Online
• Arab-Spring movement
Data from Social Media • 135,000 popular Facebook pages to collect
• Information Retrieval techniques
data on posts, comments and like behavior
Analyze Online Data on Facebook.
• Sentiment polarity analysis
(Behavior)
• The data on real-world events has been
• Statistical methods
• collected from Reuters.com
Correlational analysis
Analyze the Real-World
Behavior • Multivariate regression analysis
Data Mining and
Machine Learning Lab
13
14. Correlation between online and real events
Time that event in
real-world happened
Data Mining and
Machine Learning Lab
14
15. Observations
Time that event in
real-world
happened
Data Mining and
Machine Learning Lab
15
16. Observations
• There could be correlations between real-world events
and online discussions. However,
– Correlation is not amount to prediction
– Poor results for small events
• Many real-world events left uncovered
– Influence and cascade effects, causes too much non-relevant
discussion in social media
• What we have experimented
– Finding Influential people
– Analyzing Mood over the network
Data Mining and
Machine Learning Lab
16
17. What are people concerned about
Data Mining and
Machine Learning Lab
17
18. Challenges
• Finding Relevant Communities
– Analyzing Arab Spring tweets, show that 75 percent
of the 1 million clicks on Libya-related tweets and 89
percent of the 3 million clicks for Egypt-related
Tweets came from outside of the Arab world1
– The fallacy of millions of followers
1- http://www.stripes.com/blogs/stripes-central/stripes-central-1.8040/researchers-
skeptical-dod-can-use-social-media-to-predict-future-conflict-1.15529
Data Mining and
Machine Learning Lab
18
19. Challenges
• Data Collection
– Sufficient coverage of the data
– Source of data is unknown
– Spam
– Paid social media content
• Online behavior Analysis
– Unstructured, noisy text data
– Language ambiguity
Data Mining and
Machine Learning Lab
19
20. Observations
Real-World Behavior Prediction
– Stark difference between click and taking
real risk in the street
Data Mining and
Machine Learning Lab
20
21. Conclusions
• Social media is helping us to understand the real-
world’s events but is not a sole source
• More research and development to make social
media a reliable source for behavior analysis
• Social event prediction using social media remains
an open problem. More interdisciplinary research
should be promoted.
Data Mining and
Machine Learning Lab
21
22. Thanks!
Acknowledgments:
This work is, in part, sponsored by ONR and AFOSR
grants. We are grateful for the comments from anonymous
reviewers and members of DMML lab at ASU
Mohammad-Ali Abbasi
ali2@asu.edu
Data Mining and
Machine Learning Lab
22
Editor's Notes
Let see what do we mean by social media lens?Gap is our analysis according SM Data and analysis according to real-world dataIs there any way we can get to real-world analysis by using SM data?If so there will be many interesting applications… - social scientists, politicians, Opinion minders, market researchers, …
Social events,Arab Spring,
To what extent we can predict The election results?How accurate is our prediction?Opinion Mining and
Finance and market is another interesting domain…Stock market predictionRaise and fall of stock marketThe best scenario would be “Predicting stock market”
Can we predict GOP candidate using social media data, e.g numbers from Facebook and TwitterWhy not?Not all American voters are in Facebook and liked their candidateNot all of those in the Facebook and liked candidates are allowed to voteEven not all eligible votes in the Facebook that liked specific candidate are goring to vote for him!In this research we want to investigate the correlation between results from SM & RW data. Same resultsOppositeVague or hard to discover
Active investigationPassive investigationWe are looking for clues to discover next collective behavior
Having a messy desk means being non-organized or busy?Does what’s on your desk reveal what’s on your mind? Do those pictures on your walls tell true tales about you? And is your favorite outfit about to give you away? For the last ten years psychologist Sam Gosling has been studying how people project (and protect) their inner selves. By exploring our private worlds (desks, bedrooms, even our clothes and our cars), he shows not only how we showcase our personalities in unexpected-and unplanned-ways, but also how we create personality in the first place, communicate it others, and interpret the world around us. Gosling, one of the field’s most innovative researchers, dispatches teams of scientific snoops to poke around dorm rooms and offices, to see what can be learned about people simply from looking at their stuff. What he has discovered is astonishing: when it comes to the most essential components of our personalities-from friendliness to flexibility-the things we own and the way we arrange them often say more about us than even our most intimate conversations. If you know what to look for, you can figure out how reliable a new boyfriend is by peeking into his medicine cabinet or whether an employee is committed to her job by analyzing her cubicle. Bottom line: The insights we gain can boost our understanding of ourselves and sharpen our perceptions of others. Packed with original research and fascinating stories, Snoop is a captivating guidebook to our not-so-secret lives.
Ali: I think there is a recent paper about the negative result of the 2nd bullet (Box Office). We know that the first bullet is not really prediction
To investigate this we propose a 4-step modelFinding a good population in Social media is the first step.We need to have some representative groups both in real-world and online social media (Find a good map)We need to collect data from Egyptians not here Americans tweeting from starbucks!
Frequency of words and sentences related to the eventUni-gram, bi-gram and n-gram analysisHashtag analysis
An event Suddenly happened then created lots of discussion in social mediaThere is a correlation between real-world events and social media conversations we can observe them especially for big events (nation-wide event)But this is not all, there are many more events in real world without SM coverage, and many more not necessary coverage by SM
It is challenging to find communities or groups that even partially represent a real-world group.For most political events, specially in non-democratic countries, it is extremely difficult to find representative real-world groups:People may not have access to social mediaPeople do not want to express their true opinions in social mediaMany paid spammers in social media, specially for political events