SlideShare a Scribd company logo
1 of 44
Download to read offline
Big Social Data Analysis: Using location & 
Twitter to explore the tragic aftermath of the 
Sandy Hook Elementary School shooting 
!! 
Richard Heimann 
Chief Data Scientist at L-3 Data Tactics 
Adjunct Professor at UMBC 
! 
Keegan Hines 
Data Scientist at L-3 Data Tactics 
Richard Heimann © 2013
Big Social Data Analysis: A Case Study in Newtown 
Counting, counting, counting: 
Why do we count? How do we count? 
! 
What is measurement? What are latent constructs? 
! 
Traditional Data, Nontraditional Data | Sample vs. Population 
| Model Organism | Good Data, Bad Data 
! 
Case Study: 
Analyzing the discussing following the tragic events in 
Newtown CT. 
Richard Heimann © 2013
Counting, counting, counting… 
Richard Heimann © 2013
How do we Count? 
Notice all these categories are counting. We count everything all the time. 
Analytics in Perspective: An Inquiry into Modes of Inquiry 
http://datatactics.blogspot.com/2013/07/analytics-in-perspective-inquiry-into.html 
Richard Heimann © 2013
Counting and counting… 
A: 75% of Americans favor some level of gun control. 
A{spatial}: Americans in the northeast favor aggressive gun control by 3:1 
over the south and midwest. 
B: Most Americans favor some level of gun control. 
B{spatial}: Americans in the northeast favor aggressive gun control by more 
than double Americans in the south and midwest. 
C: Americans favor gun control. 
C{spatial}: Americans in the northeast favor aggressive gun control over the 
south and midwest. 
D: All Americans favor gun control. 
Richard Heimann © 2013
Counting and counting… 
D: All Americans favor gun control. 
- Many 
- Much 
- Some 
- Numerous 
- A little, A lot 
- Often 
- Always 
- Rarely 
Richard Heimann © 2013
Why Quantitative Analysis? 
Why is quantitative data analysis so important? 
! 
“…the alternative to good statistics is not “no 
statistics,” it’s bad statistics. People who argue 
against statistical reasoning often end up backing up 
their arguments with whatever numbers they have at 
their command, over- or under-adjusting in their 
eagerness to avoid anything systematic” 
! 
Bill James 
Richard Heimann © 2013
E.g. Quasi-Geo-qualitative Analysis? 
Here the 
different Tribes 
meet in 
Friendship and 
collect Stone 
for Pipes. 
Yanktons a Band 
Richard Heimann © 2013 
of 
Sioux - 1000 
Souls 
F. Ratzel, C. Wissler, & C. 
Sauer: Culture Area Research 
and Mapping (1850’s) 
Maps Descriptive of London Poverty (1899) 
“No. 34 is occupied by the widow of a boatman. He committed suicide and left her with 
eleven children. Some have died, and she has five here now, two of whom go to work, 
and three to school. She makes sailor jackets, but is nearly blind. Struggles hard for her 
children…”
Measurement… 
Richard Heimann © 2013
What is measurement? 
Goal: /all/ measurement is to arrange items on a 
continuum (observed or unobserved). 
Richard Heimann © 2013
You can see a lot by observing… 
! 
Yogi Berra 
Richard Heimann © 2013
What is measurement - real world? 
Q: How can we measure something that is 
unobserved, or for which there is no direct 
measure? 
! 
A: Use a statistical model to measure the 
relationships between the observable variables 
and the unobserved (or “latent”) quantity. 
Richard Heimann © 2013
What is measurement - real world? 
Estimating unobservable quantities: E.g. “topic or theme” 
Twitter Word1 Word2 Word3 topic 
@TheDude 1 1 0 
@WalterSobchak 0 1 1 
@TheBigLebowski 1 0 1 
… … … … 
@Donny 1 1 0 
Richard Heimann © 2013
Traditional Data, Nontraditional Data… 
Richard Heimann © 2013
Traditional Approaches to SocSci Inquiry 
For example Gun Control: 
Surveys - that is, ask people what their position is about gun control. 
…but, who? how many? Your friends? Family? People in your neighborhood? This is 
expensive. 
Polls - similar to above but you often offer multiple choice. 
…but, how do you construct the questions? How many questions? Same issues as 
above. This is expensive. 
Legislature - count votes and federal/state funding. 
…but, what are we measuring? Lobbyist or American valence? 
Gun Sales/Deaths: that is, count the number of gun sales and/or deaths. 
…but, are these normalized values? Are existing gun control laws controlled for? 
Richard Heimann © 2013
Nontraditional Approaches to SocSci Inquiry 
Text is not only big, but is growing at an increasing rate. Twitter was 
launched March 21, 2006 and it took 3 years, 2 months and 1 day to 
reach 1 billion tweets. Twitter users now send one billion every 2.5 
days. 
People are highly opinionated. 
Its inexpensive: 
> library(twitteR) 
> guncontrol<- searchTwitter("#guncontrol", n=n, 
cainfo=“cacert.pem”) 
! 
Its comprehensive: 
Dec2012: ~30 days, ~210M tweets, ~40,334,000 users; 
#guncontrol ~14,500 tweets, ~ 10,200 users. 
! 
! 
Richard Heimann © 2013
Population vs. Sample 
Population: The entire group under study. 
! 
Populations (N): often so large that we cannot examine 
the entire group. Samples are selected to represent the 
population. 
! 
Sample (n): Samples help answer questions about the 
population. 
! 
Nontraditional data allows n -> N 
Richard Heimann © 2013
Twitter as a model organism: 
Richard Heimann © 2013
Good data, Bad data 
What is Good Data? 
Is garbage in, garbage out” a statement we ought to take seriously? 
+ Data collected in targeted rigorous ways - aka AUTHORITATIVE! 
E.g. Census data, Surveys, Polls. 
- Data tends to be narrow in scope both geographically and temporally and 
infrequently measured, if ever again. 
Linchpin is “when the data is available.” The vast majority of data relating to 
emerging questions related to business, politics and social science simply do 
not exist. 
What is Bad Data? 
Social Media is millions of conversations happening continuously and 
concurrently with varying degrees of decay and magnitude — Lots of signal 
and lots of noise. (+) We can ask a variety of questions from it. 
Richard Heimann © 2013
Good data, Bad data 
The opposite of good data is not bad data, it 
is no data. 
! 
The point: Good data and Bad data does not 
exist; there is just Data and NO Data. 
Richard Heimann © 2013
Good data, Bad data & A model organism 
Question: Is Twitter a model 
organism? 
! 
Reality: We live in an imperfect world 
producing imperfect data - our job is 
to work with it. 
Richard Heimann © 2013
Richard Heimann © 2013 
Case Study
Big Social Data Analysis: A Case Study in Newtown 
General ProControl AgainstControl 
Richard Heimann © 2013 
#PrayForNewtown 
Emotional hashtag. 
#gunfail 
pro gun control 
#gunrights 
anti gun control 
#NRA 
vague, broad 
#p2 
Refers to Progressives 2.0, the resource for 
progressives on social media. Progressivism is a 
political philosophy that prioritizes diversity and 
empowerment through social activism. 
#2ndAmendment 
anti gun control 
#CTshooting 
vague, broad 
#gunsense 
seems to be pro gun control 
#2a 
2ndAmendment 
anit gun control 
#guncontrol 
vague, broad 
#NowIsTheTime 
seems to be pro gun control 
#tcot 
Top Conservatives on Twitter: seems to 
be against gun control 
#newtown 
Emotional hashtag.
Big Social Data Analysis: A Case Study in Newtown 
> colnames(fulldf) 
[1] "ORIG_FILE" "TWEET_ID" "TIMESTAMP" "SCREEN_NAM" "TRUE_NAME" 
[6] "GENDER" "LOCATION" "LONG" "LAT" "HASHTAGS" 
[11] "GT_COUNT" "PT_COUNT" "AT_COUNT" "LANG" "TEXT" 
[16] "NAME" "trimmedTweets" "Topic" "TimeStamp_Day" "IRT_score" 
Richard Heimann © 2013
Big Social Data Analysis: A Case Study in Newtown 
Bag of Words - the order of words doesn’t matter, we’re 
simply interested in which words were used 
Richard Heimann © 2013 
Forget it, Donny, you're out of 
your element! 
Life does not start and stop at 
your convenience 
donny, element, forget, it, out, 
of your, you’re 
and, at, convenience, does, 
life, not, start, stop, your
Big Social Data Analysis: A Case Study in Newtown 
Stopwords - remove words which are so common as to be 
uninformative (e.g. pronouns, articles) 
Richard Heimann © 2013 
Forget it, Donny, you're out of 
your element! 
Life does not start and stop at 
your convenience 
donny, element, forget, out 
convenience, does, life, start, 
stop
Big Social Data Analysis: A Case Study in Newtown 
Stemming - the same verb with different conjugations and 
tenses should be represented in just one way 
I run, he runs, we enjoy running. run, run, enjoy, run 
Richard Heimann © 2013 
You mark that frame an 8, and 
you're entering a world of pain. 
mark, frame, 8, enter, world, 
pain
Big Social Data Analysis: A Case Study in Newtown 
Topic Model - some tweets discuss similar things, they 
ought to be grouped together 
Richard Heimann © 2013 
Smokey, this is not 'Nam. This 
is bowling. There are rules. 
How come you don't roll on 
Saturday, Walter? 
I don't roll on Shabbos! 
Well, sir, it's this rug I had. It 
really tied the room together. 
I need to see you. I'm the 
one who took your rug. 
Walter, he peed on my rug!
Big Social Data Analysis: A Case Study in Newtown 
Idea: Posit a number of latent “topics,” then estimate the 
relationship between words-in-topics, and topics-in-tweets. 
Tweet 1 bowling 'nam 
Richard Heimann © 2013 
Tweet 2 
… 
Tweet n 
Words/ Topics 
Terms 
Documents 
aka: Tweets 
Bowling 
Rug 
smokey 
shabbos! 
rug 
peed 
saturday 
rules 
roll need 
took 
walter 
how 
come
Big Social Data Analysis: A Case Study in Newtown 
With the twitter data, we’ll do a Topic model with 3 topics. 
Richard Heimann © 2013 
Topic # 1 
This topic seems to 
capture 
discussions of gun 
control and gun 
rights as this 
political issue 
emerged in the 
conversation. 
Top Words Example Tweets 
wow, this shooting shit needs to stop. 
#guncontrol now. 
oh good. #obamao has put #biden in 
charge of #guncontrol. that makes me feel 
all better about my rights and liberties. 
gun control is like trying to reduce drunk 
driving by making it harder for sober 
people to buy cars! #tcot #pjnet 
#tcot 
#tlot 
#nra 
#2ndamendment 
#p2 
gun 
#gun 
control #newtown 
#guncontrol #2a 
right 
america 
armed 
obama 
assault
Big Social Data Analysis: A Case Study in Newtown 
With the twitter data, we’ll do a Topic model with 3 topics. 
With the twitter data, we’ll do a Topic model with 3 topics. 
Topic # 2 Top Words Example Tweets 
obama 
Richard Heimann © 2013 
This topic seems 
to capture general 
chatter, commonly 
used words, and 
spam tweets. 
senate 
#tcot 
#tlot 
video 
house tell like 
free fiscal tax 
news 
need 
via 
help @newhampshirecr reach 600 followers! only 
7 more to go! #nhcr #nhpolitics #tcot 
rt @msegieda: @foxnews writes about 
@fracknation's premiere tonight on @axstv 9 pm 
et http://t.co/paafwgux #fracking #tcot #tlot #tp 
michigan man, dog rescued after ice breaks: 
http://t.co/cc983uyv #tcot
Big Social Data Analysis: A Case Study in Newtown 
With the twitter data, we’ll do a Topic model with 3 topics. 
Topic # 3 Top Words Example Tweets 
Richard Heimann © 2013 
This topic seems 
to capture 
descriptions of the 
tragedy as well as 
expressions of 
sympathy, and 
sadness. 
my heart goes out to everyone affected by the 
shooting at #sandycook elementary. a senseless 
tragedy. i can't imagine your pain. #newtown 
dear god, please protect our babies from the 
monsters that live among us. my heart is 
breaking for those in #newtown 
thoughts are with the students of #sandyhook 
#newtown sad situation 
#prayfornewtown 
#newtown 
#ctshooting 
children 
families school 
prayers thoughts 
victims sad 
tragedy 
kids 
god 
little 
today 
rip lanza
Big Social Data Analysis: A Case Study in Newtown 
• With a topic model, we can extract topics that 
make intuitive sense 
! 
• But what about the usage patterns of these 
topics? 
! 
• Are there interesting temporal, ideological, or 
geographical trends/patterns? 
Richard Heimann © 2013
Big Social Data Analysis: A Case Study in Newtown 
Richard Heimann © 2013 
12/14 
Newtown Shootings 
Count 
Time/Date 
12/16 
Obama openly wepts as 
he addressed the nation 
in the hours after the 
attack -- and stated that 
now was the time for 
"meaningful action" on 
gun violence. 
12/19 
“In the coming weeks, I will use 
whatever power this office holds 
to engage my fellow citizens, 
from law enforcement to mental 
health professionals to parents 
and educators, in an effort 
aimed at preventing more 
tragedies like this," Obama 
said. "Because what choice do 
we have? We can't accept 
events like this as routine." 
1/19 
Obama Presents Gun 
Control Agenda; 
Includes 23 Executive 
Orders 
GunControl 
Mixed 
Sympathy
Big Social Data Analysis: A Case Study in Newtown 
Richard Heimann © 2013
Big Social Data Analysis: A Case Study in Newtown 
Everyone 
Richard Heimann © 2013
Big Social Data Analysis: A Case Study in Newtown 
Everyone by Topic 
Richard Heimann © 2013 
GunControl 
2,733; 34.7% 
Mixed 
2,637; 33.5% 
Sympathy 
2,505; 31.8%
Big Social Data Analysis: A Case Study in Newtown 
Richard Heimann © 2013 
Red: 2,179 
Pink: 434 
Purple: 887 
Light Blue: 161 
Blue: 4,214
Topics 
GunControl Mixed Sympathy 
Richard Heimann © 2013 
Geography 
Red 
Pink 
Purple 
Light Blue 
Blue 
10% 9.2% 8.5% 
18.3% 17.2% 18%
Spatially Explicit Theory 
Proximate casualty hypothesis; (Gartner, Segura, and Wilkening 1997) 
! 
Time and space provide new insight on the multiple processes 
underlying opinion change in today’s complex information 
environment. 
! 
A case study of the “proximate casualties” hypothesis, the idea 
that popular support for American wars is undermined at the 
individual level more by the deaths of American personnel from 
nearby areas than by the deaths of those from far away. 
Richard Heimann © 2013 
http://jcr.sagepub.com/content/41/5/669.abstract
Richard Heimann © 2013
Topics 
Richard Heimann © 2013 
East North Central 
East South Central 
Middle Atlantic 
Mountain 
New England 
Pacific 
South Atlantic 
West North Central 
West South Central 
GunControl Mixed Sympathy 
Geography
Summary 
Counting, counting, counting: 
Why do we count? How do we count? Contrasting quantitative 
counting vs. qualitative counting. 
! 
What is measurement? What are latent constructs? 
! 
Traditional vs. Nontraditional approaches to SocSci Inquiry. 
Population vs. Sample. Model Organisms. Good data, Bad data. 
! 
Case Study in Newtown using Twitter, Topic Modeling to explore 
temporal, ideological, and geographical elements. 
Richard Heimann © 2013
Thank you… Questions? 
Richard Heimann: @rheimann 
https://twitter.com/rheimann 
rheimann@umbc.edu 
Data Tactics Big Data Insights [blog]: 
http://datatactics.blogspot.com 
! 
! 
Keegan Hines: @keeghin 
https://twitter.com/keeghin 
Richard Heimann © 2013

More Related Content

What's hot

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Jonathan Stray
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignJonathan Stray
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Jonathan Stray
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...Micah Altman
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and VisualizationDr. Neil Brittliff
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Shared data and the future of libraries
Shared data and the future of librariesShared data and the future of libraries
Shared data and the future of librariesRegan Harper
 

What's hot (20)

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter Design
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
BDACA - Lecture4
BDACA - Lecture4BDACA - Lecture4
BDACA - Lecture4
 
BD-ACA Week8a
BD-ACA Week8aBD-ACA Week8a
BD-ACA Week8a
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Lecture #03
Lecture #03Lecture #03
Lecture #03
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Shared data and the future of libraries
Shared data and the future of librariesShared data and the future of libraries
Shared data and the future of libraries
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 

Similar to Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in the Tragic Aftermath of the Sandy Hook School Shooting

Understanding The Big Data Mess
Understanding The Big Data MessUnderstanding The Big Data Mess
Understanding The Big Data MessCision
 
Public #transneeds deck
Public #transneeds deckPublic #transneeds deck
Public #transneeds deckianolsonsb
 
M-Brain business intelligence seminar Oct 7 2010 final
M-Brain business intelligence seminar Oct 7 2010 finalM-Brain business intelligence seminar Oct 7 2010 final
M-Brain business intelligence seminar Oct 7 2010 finalMark Linder
 
Social Media Ethics Presentation
Social Media Ethics PresentationSocial Media Ethics Presentation
Social Media Ethics PresentationPalmRyan
 
How To Write An Introduction For An Essay Ppt
How To Write An Introduction For An Essay PptHow To Write An Introduction For An Essay Ppt
How To Write An Introduction For An Essay PptEllen Blackburn
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016steveparkhurst
 
Argumentative Essay Cats Better Than Dogs
Argumentative Essay Cats Better Than DogsArgumentative Essay Cats Better Than Dogs
Argumentative Essay Cats Better Than DogsVanessa Perkins
 
Influence Strategies for Software Professionals
Influence Strategies for Software ProfessionalsInfluence Strategies for Software Professionals
Influence Strategies for Software ProfessionalsTechWell
 
Guerrero, manuel public-connection-civic-deliberation-salzburg-2015
Guerrero, manuel   public-connection-civic-deliberation-salzburg-2015Guerrero, manuel   public-connection-civic-deliberation-salzburg-2015
Guerrero, manuel public-connection-civic-deliberation-salzburg-2015Salzburg Global Seminar
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real worldDiana Maynard
 
Social Justice & Black Twitter
Social Justice & Black TwitterSocial Justice & Black Twitter
Social Justice & Black TwitterAyodele Odubela
 
Who is to Blame for the Government Shutdown
Who is to Blame for the Government ShutdownWho is to Blame for the Government Shutdown
Who is to Blame for the Government ShutdownIpsos Public Affairs
 
Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014Hassel Fallas
 
Ed Snowden: hero or villain? And the implications for media and democracy
Ed Snowden: hero or villain? And the implications for media and democracyEd Snowden: hero or villain? And the implications for media and democracy
Ed Snowden: hero or villain? And the implications for media and democracyPOLIS LSE
 
Acoustic Spaces And Ambient Noise Levels
Acoustic Spaces And Ambient Noise LevelsAcoustic Spaces And Ambient Noise Levels
Acoustic Spaces And Ambient Noise LevelsStephanie Rivas
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Invasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaInvasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaKelly Ratkovic
 

Similar to Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in the Tragic Aftermath of the Sandy Hook School Shooting (20)

Understanding The Big Data Mess
Understanding The Big Data MessUnderstanding The Big Data Mess
Understanding The Big Data Mess
 
Public #transneeds deck
Public #transneeds deckPublic #transneeds deck
Public #transneeds deck
 
M-Brain business intelligence seminar Oct 7 2010 final
M-Brain business intelligence seminar Oct 7 2010 finalM-Brain business intelligence seminar Oct 7 2010 final
M-Brain business intelligence seminar Oct 7 2010 final
 
Social Media Ethics Presentation
Social Media Ethics PresentationSocial Media Ethics Presentation
Social Media Ethics Presentation
 
Guestlecture on #bigdata
Guestlecture on #bigdataGuestlecture on #bigdata
Guestlecture on #bigdata
 
How To Write An Introduction For An Essay Ppt
How To Write An Introduction For An Essay PptHow To Write An Introduction For An Essay Ppt
How To Write An Introduction For An Essay Ppt
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016
 
Argumentative Essay Cats Better Than Dogs
Argumentative Essay Cats Better Than DogsArgumentative Essay Cats Better Than Dogs
Argumentative Essay Cats Better Than Dogs
 
Influence Strategies for Software Professionals
Influence Strategies for Software ProfessionalsInfluence Strategies for Software Professionals
Influence Strategies for Software Professionals
 
Guerrero, manuel public-connection-civic-deliberation-salzburg-2015
Guerrero, manuel   public-connection-civic-deliberation-salzburg-2015Guerrero, manuel   public-connection-civic-deliberation-salzburg-2015
Guerrero, manuel public-connection-civic-deliberation-salzburg-2015
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
 
Social Justice & Black Twitter
Social Justice & Black TwitterSocial Justice & Black Twitter
Social Justice & Black Twitter
 
Who is to Blame for the Government Shutdown
Who is to Blame for the Government ShutdownWho is to Blame for the Government Shutdown
Who is to Blame for the Government Shutdown
 
Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014
 
Ed Snowden: hero or villain? And the implications for media and democracy
Ed Snowden: hero or villain? And the implications for media and democracyEd Snowden: hero or villain? And the implications for media and democracy
Ed Snowden: hero or villain? And the implications for media and democracy
 
Acoustic Spaces And Ambient Noise Levels
Acoustic Spaces And Ambient Noise LevelsAcoustic Spaces And Ambient Noise Levels
Acoustic Spaces And Ambient Noise Levels
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Invasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaInvasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian Media
 
Tech2Empower.v2
Tech2Empower.v2Tech2Empower.v2
Tech2Empower.v2
 

More from Rich Heimann

Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"Rich Heimann
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Rich Heimann
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Rich Heimann
 
GES673 SP2014 Intro Lecture
GES673 SP2014 Intro LectureGES673 SP2014 Intro Lecture
GES673 SP2014 Intro LectureRich Heimann
 
Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)Rich Heimann
 
Spatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBCSpatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBCRich Heimann
 
Spatial Analysis and Geomatics
Spatial Analysis and GeomaticsSpatial Analysis and Geomatics
Spatial Analysis and GeomaticsRich Heimann
 
Week 1 Lecture @ UMBC
Week 1 Lecture @ UMBCWeek 1 Lecture @ UMBC
Week 1 Lecture @ UMBCRich Heimann
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Rich Heimann
 

More from Rich Heimann (9)

Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)
 
GES673 SP2014 Intro Lecture
GES673 SP2014 Intro LectureGES673 SP2014 Intro Lecture
GES673 SP2014 Intro Lecture
 
Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)
 
Spatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBCSpatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBC
 
Spatial Analysis and Geomatics
Spatial Analysis and GeomaticsSpatial Analysis and Geomatics
Spatial Analysis and Geomatics
 
Week 1 Lecture @ UMBC
Week 1 Lecture @ UMBCWeek 1 Lecture @ UMBC
Week 1 Lecture @ UMBC
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)
 

Recently uploaded

Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfdcphostmaster
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Neo4j
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performancePrithaVashisht1
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 

Recently uploaded (20)

Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdf
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performance
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 

Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in the Tragic Aftermath of the Sandy Hook School Shooting

  • 1. Big Social Data Analysis: Using location & Twitter to explore the tragic aftermath of the Sandy Hook Elementary School shooting !! Richard Heimann Chief Data Scientist at L-3 Data Tactics Adjunct Professor at UMBC ! Keegan Hines Data Scientist at L-3 Data Tactics Richard Heimann © 2013
  • 2. Big Social Data Analysis: A Case Study in Newtown Counting, counting, counting: Why do we count? How do we count? ! What is measurement? What are latent constructs? ! Traditional Data, Nontraditional Data | Sample vs. Population | Model Organism | Good Data, Bad Data ! Case Study: Analyzing the discussing following the tragic events in Newtown CT. Richard Heimann © 2013
  • 3. Counting, counting, counting… Richard Heimann © 2013
  • 4. How do we Count? Notice all these categories are counting. We count everything all the time. Analytics in Perspective: An Inquiry into Modes of Inquiry http://datatactics.blogspot.com/2013/07/analytics-in-perspective-inquiry-into.html Richard Heimann © 2013
  • 5. Counting and counting… A: 75% of Americans favor some level of gun control. A{spatial}: Americans in the northeast favor aggressive gun control by 3:1 over the south and midwest. B: Most Americans favor some level of gun control. B{spatial}: Americans in the northeast favor aggressive gun control by more than double Americans in the south and midwest. C: Americans favor gun control. C{spatial}: Americans in the northeast favor aggressive gun control over the south and midwest. D: All Americans favor gun control. Richard Heimann © 2013
  • 6. Counting and counting… D: All Americans favor gun control. - Many - Much - Some - Numerous - A little, A lot - Often - Always - Rarely Richard Heimann © 2013
  • 7. Why Quantitative Analysis? Why is quantitative data analysis so important? ! “…the alternative to good statistics is not “no statistics,” it’s bad statistics. People who argue against statistical reasoning often end up backing up their arguments with whatever numbers they have at their command, over- or under-adjusting in their eagerness to avoid anything systematic” ! Bill James Richard Heimann © 2013
  • 8. E.g. Quasi-Geo-qualitative Analysis? Here the different Tribes meet in Friendship and collect Stone for Pipes. Yanktons a Band Richard Heimann © 2013 of Sioux - 1000 Souls F. Ratzel, C. Wissler, & C. Sauer: Culture Area Research and Mapping (1850’s) Maps Descriptive of London Poverty (1899) “No. 34 is occupied by the widow of a boatman. He committed suicide and left her with eleven children. Some have died, and she has five here now, two of whom go to work, and three to school. She makes sailor jackets, but is nearly blind. Struggles hard for her children…”
  • 10. What is measurement? Goal: /all/ measurement is to arrange items on a continuum (observed or unobserved). Richard Heimann © 2013
  • 11. You can see a lot by observing… ! Yogi Berra Richard Heimann © 2013
  • 12. What is measurement - real world? Q: How can we measure something that is unobserved, or for which there is no direct measure? ! A: Use a statistical model to measure the relationships between the observable variables and the unobserved (or “latent”) quantity. Richard Heimann © 2013
  • 13. What is measurement - real world? Estimating unobservable quantities: E.g. “topic or theme” Twitter Word1 Word2 Word3 topic @TheDude 1 1 0 @WalterSobchak 0 1 1 @TheBigLebowski 1 0 1 … … … … @Donny 1 1 0 Richard Heimann © 2013
  • 14. Traditional Data, Nontraditional Data… Richard Heimann © 2013
  • 15. Traditional Approaches to SocSci Inquiry For example Gun Control: Surveys - that is, ask people what their position is about gun control. …but, who? how many? Your friends? Family? People in your neighborhood? This is expensive. Polls - similar to above but you often offer multiple choice. …but, how do you construct the questions? How many questions? Same issues as above. This is expensive. Legislature - count votes and federal/state funding. …but, what are we measuring? Lobbyist or American valence? Gun Sales/Deaths: that is, count the number of gun sales and/or deaths. …but, are these normalized values? Are existing gun control laws controlled for? Richard Heimann © 2013
  • 16. Nontraditional Approaches to SocSci Inquiry Text is not only big, but is growing at an increasing rate. Twitter was launched March 21, 2006 and it took 3 years, 2 months and 1 day to reach 1 billion tweets. Twitter users now send one billion every 2.5 days. People are highly opinionated. Its inexpensive: > library(twitteR) > guncontrol<- searchTwitter("#guncontrol", n=n, cainfo=“cacert.pem”) ! Its comprehensive: Dec2012: ~30 days, ~210M tweets, ~40,334,000 users; #guncontrol ~14,500 tweets, ~ 10,200 users. ! ! Richard Heimann © 2013
  • 17. Population vs. Sample Population: The entire group under study. ! Populations (N): often so large that we cannot examine the entire group. Samples are selected to represent the population. ! Sample (n): Samples help answer questions about the population. ! Nontraditional data allows n -> N Richard Heimann © 2013
  • 18. Twitter as a model organism: Richard Heimann © 2013
  • 19. Good data, Bad data What is Good Data? Is garbage in, garbage out” a statement we ought to take seriously? + Data collected in targeted rigorous ways - aka AUTHORITATIVE! E.g. Census data, Surveys, Polls. - Data tends to be narrow in scope both geographically and temporally and infrequently measured, if ever again. Linchpin is “when the data is available.” The vast majority of data relating to emerging questions related to business, politics and social science simply do not exist. What is Bad Data? Social Media is millions of conversations happening continuously and concurrently with varying degrees of decay and magnitude — Lots of signal and lots of noise. (+) We can ask a variety of questions from it. Richard Heimann © 2013
  • 20. Good data, Bad data The opposite of good data is not bad data, it is no data. ! The point: Good data and Bad data does not exist; there is just Data and NO Data. Richard Heimann © 2013
  • 21. Good data, Bad data & A model organism Question: Is Twitter a model organism? ! Reality: We live in an imperfect world producing imperfect data - our job is to work with it. Richard Heimann © 2013
  • 22. Richard Heimann © 2013 Case Study
  • 23. Big Social Data Analysis: A Case Study in Newtown General ProControl AgainstControl Richard Heimann © 2013 #PrayForNewtown Emotional hashtag. #gunfail pro gun control #gunrights anti gun control #NRA vague, broad #p2 Refers to Progressives 2.0, the resource for progressives on social media. Progressivism is a political philosophy that prioritizes diversity and empowerment through social activism. #2ndAmendment anti gun control #CTshooting vague, broad #gunsense seems to be pro gun control #2a 2ndAmendment anit gun control #guncontrol vague, broad #NowIsTheTime seems to be pro gun control #tcot Top Conservatives on Twitter: seems to be against gun control #newtown Emotional hashtag.
  • 24. Big Social Data Analysis: A Case Study in Newtown > colnames(fulldf) [1] "ORIG_FILE" "TWEET_ID" "TIMESTAMP" "SCREEN_NAM" "TRUE_NAME" [6] "GENDER" "LOCATION" "LONG" "LAT" "HASHTAGS" [11] "GT_COUNT" "PT_COUNT" "AT_COUNT" "LANG" "TEXT" [16] "NAME" "trimmedTweets" "Topic" "TimeStamp_Day" "IRT_score" Richard Heimann © 2013
  • 25. Big Social Data Analysis: A Case Study in Newtown Bag of Words - the order of words doesn’t matter, we’re simply interested in which words were used Richard Heimann © 2013 Forget it, Donny, you're out of your element! Life does not start and stop at your convenience donny, element, forget, it, out, of your, you’re and, at, convenience, does, life, not, start, stop, your
  • 26. Big Social Data Analysis: A Case Study in Newtown Stopwords - remove words which are so common as to be uninformative (e.g. pronouns, articles) Richard Heimann © 2013 Forget it, Donny, you're out of your element! Life does not start and stop at your convenience donny, element, forget, out convenience, does, life, start, stop
  • 27. Big Social Data Analysis: A Case Study in Newtown Stemming - the same verb with different conjugations and tenses should be represented in just one way I run, he runs, we enjoy running. run, run, enjoy, run Richard Heimann © 2013 You mark that frame an 8, and you're entering a world of pain. mark, frame, 8, enter, world, pain
  • 28. Big Social Data Analysis: A Case Study in Newtown Topic Model - some tweets discuss similar things, they ought to be grouped together Richard Heimann © 2013 Smokey, this is not 'Nam. This is bowling. There are rules. How come you don't roll on Saturday, Walter? I don't roll on Shabbos! Well, sir, it's this rug I had. It really tied the room together. I need to see you. I'm the one who took your rug. Walter, he peed on my rug!
  • 29. Big Social Data Analysis: A Case Study in Newtown Idea: Posit a number of latent “topics,” then estimate the relationship between words-in-topics, and topics-in-tweets. Tweet 1 bowling 'nam Richard Heimann © 2013 Tweet 2 … Tweet n Words/ Topics Terms Documents aka: Tweets Bowling Rug smokey shabbos! rug peed saturday rules roll need took walter how come
  • 30. Big Social Data Analysis: A Case Study in Newtown With the twitter data, we’ll do a Topic model with 3 topics. Richard Heimann © 2013 Topic # 1 This topic seems to capture discussions of gun control and gun rights as this political issue emerged in the conversation. Top Words Example Tweets wow, this shooting shit needs to stop. #guncontrol now. oh good. #obamao has put #biden in charge of #guncontrol. that makes me feel all better about my rights and liberties. gun control is like trying to reduce drunk driving by making it harder for sober people to buy cars! #tcot #pjnet #tcot #tlot #nra #2ndamendment #p2 gun #gun control #newtown #guncontrol #2a right america armed obama assault
  • 31. Big Social Data Analysis: A Case Study in Newtown With the twitter data, we’ll do a Topic model with 3 topics. With the twitter data, we’ll do a Topic model with 3 topics. Topic # 2 Top Words Example Tweets obama Richard Heimann © 2013 This topic seems to capture general chatter, commonly used words, and spam tweets. senate #tcot #tlot video house tell like free fiscal tax news need via help @newhampshirecr reach 600 followers! only 7 more to go! #nhcr #nhpolitics #tcot rt @msegieda: @foxnews writes about @fracknation's premiere tonight on @axstv 9 pm et http://t.co/paafwgux #fracking #tcot #tlot #tp michigan man, dog rescued after ice breaks: http://t.co/cc983uyv #tcot
  • 32. Big Social Data Analysis: A Case Study in Newtown With the twitter data, we’ll do a Topic model with 3 topics. Topic # 3 Top Words Example Tweets Richard Heimann © 2013 This topic seems to capture descriptions of the tragedy as well as expressions of sympathy, and sadness. my heart goes out to everyone affected by the shooting at #sandycook elementary. a senseless tragedy. i can't imagine your pain. #newtown dear god, please protect our babies from the monsters that live among us. my heart is breaking for those in #newtown thoughts are with the students of #sandyhook #newtown sad situation #prayfornewtown #newtown #ctshooting children families school prayers thoughts victims sad tragedy kids god little today rip lanza
  • 33. Big Social Data Analysis: A Case Study in Newtown • With a topic model, we can extract topics that make intuitive sense ! • But what about the usage patterns of these topics? ! • Are there interesting temporal, ideological, or geographical trends/patterns? Richard Heimann © 2013
  • 34. Big Social Data Analysis: A Case Study in Newtown Richard Heimann © 2013 12/14 Newtown Shootings Count Time/Date 12/16 Obama openly wepts as he addressed the nation in the hours after the attack -- and stated that now was the time for "meaningful action" on gun violence. 12/19 “In the coming weeks, I will use whatever power this office holds to engage my fellow citizens, from law enforcement to mental health professionals to parents and educators, in an effort aimed at preventing more tragedies like this," Obama said. "Because what choice do we have? We can't accept events like this as routine." 1/19 Obama Presents Gun Control Agenda; Includes 23 Executive Orders GunControl Mixed Sympathy
  • 35. Big Social Data Analysis: A Case Study in Newtown Richard Heimann © 2013
  • 36. Big Social Data Analysis: A Case Study in Newtown Everyone Richard Heimann © 2013
  • 37. Big Social Data Analysis: A Case Study in Newtown Everyone by Topic Richard Heimann © 2013 GunControl 2,733; 34.7% Mixed 2,637; 33.5% Sympathy 2,505; 31.8%
  • 38. Big Social Data Analysis: A Case Study in Newtown Richard Heimann © 2013 Red: 2,179 Pink: 434 Purple: 887 Light Blue: 161 Blue: 4,214
  • 39. Topics GunControl Mixed Sympathy Richard Heimann © 2013 Geography Red Pink Purple Light Blue Blue 10% 9.2% 8.5% 18.3% 17.2% 18%
  • 40. Spatially Explicit Theory Proximate casualty hypothesis; (Gartner, Segura, and Wilkening 1997) ! Time and space provide new insight on the multiple processes underlying opinion change in today’s complex information environment. ! A case study of the “proximate casualties” hypothesis, the idea that popular support for American wars is undermined at the individual level more by the deaths of American personnel from nearby areas than by the deaths of those from far away. Richard Heimann © 2013 http://jcr.sagepub.com/content/41/5/669.abstract
  • 42. Topics Richard Heimann © 2013 East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central GunControl Mixed Sympathy Geography
  • 43. Summary Counting, counting, counting: Why do we count? How do we count? Contrasting quantitative counting vs. qualitative counting. ! What is measurement? What are latent constructs? ! Traditional vs. Nontraditional approaches to SocSci Inquiry. Population vs. Sample. Model Organisms. Good data, Bad data. ! Case Study in Newtown using Twitter, Topic Modeling to explore temporal, ideological, and geographical elements. Richard Heimann © 2013
  • 44. Thank you… Questions? Richard Heimann: @rheimann https://twitter.com/rheimann rheimann@umbc.edu Data Tactics Big Data Insights [blog]: http://datatactics.blogspot.com ! ! Keegan Hines: @keeghin https://twitter.com/keeghin Richard Heimann © 2013