SlideShare a Scribd company logo
Big Data Technology 
and the Social Sciences: 
A Lecture at Mannheim University 
Abe Usher CCHP, CISSP 
Chief Technology Officer, HumanGeo
2 
What’s In It For You? 
Theory 
• Definitions and overview 
•Where data are being generated 
Practice 
• Google’s three secret techniques* 
for unlocking insights from data 
•The kitchen model 
•Recommended resources to build 
data science skills 
Presentation slides: 
http://www.slideshare.net/abeusher/big-data-and-the-social-sciences 
*Not specifically endorsed by Google. Also, not really a secret.
3 
Background 
HumanGeo is focused on digital Human Geography: 
 Understanding the location attributes of individuals and groups 
 And the social attributes of locations 
 Through ‘Big Data’ analysis of billions geolocated data elements
4 
Big Data Wake-Up Call 
Berkeley University Research http://goo.gl/zjSUr1 
By 2016 the rate of data growth surpasses the rate of Moore’s Law
5 
Defining Big Data 
http://knowyourmeme.com/memes/you-keep-using-that-word-i-do-not-think-it-means-what-you-think-it-means
6 
Big Data Definition 
Boring Traditional definition 
“High volume, velocity and variety 
information assets that demand 
cost-effective, innovative forms of 
information processing for 
enhanced insight and decision 
making.”
7 
Big Data Definition 
Abe’s definition:
8 
The Original “Big Data” 
1880 US Census 
• 50 million people 
•Data included: age, gender, number 
of insane people in household* 
•Took 7 years to tabulate 
• 1890 Census estimated at 13 years to 
complete 
*Credit to Ken Krugler for this factoid: http://www.censusrecords.com/content/1880_census
9 
The Original “Big Data” 
1880 US Census 
• 50 million people 
•Data included: age, gender, number 
of insane people in household* 
•Took 7 years to tabulate 
• 1890 Census estimated at 13 years to 
complete 
1890 
• 63 million people 
•Additional data: citizenship and 
military service 
•New technology: Hollerith Tabulating 
System 
•Took 6 weeks to tabulate (76x faster) 
Takeaway 
• Better technology and methodology led 
to 76x speedup 
*Credit to Ken Krugler for this factoid: http://www.censusrecords.com/content/1880_census
10 
Data Generation 
Where are data created? 
•Website interaction logs 
•Social Media 
•Cyber events 
• Smartphones 
What is the volume? 
•3B phone calls in USA 
• 700M Facebook posts 
• 500M tweets per day 
• 50B WhatsApp messages per day 
Takeaway 
• Social media, telecommunication, 
and instant messaging generate an 
increasingly high volume of data
11 
Traditional Model 
of Interpreting Observations 
Tracy Morrow (aka “Ice T”) 
How can you identify a 
legitimate hip-hop artist 
(versus someone who just gets 
up and rhymes)? 
http://www.npr.org/2005/08/30/4824690/original-gangster-rapper-and-actor-ice-t
12 
Tracy Morrow (aka “Ice T”) 
Traditional Model 
of Interpreting Observations 
How can you identify a 
legitimate hip-hop artist 
(versus someone who just gets 
up and rhymes)? 
“Game knows game, baby.”
13 
Tracy Morrow (aka “Ice T”) 
Traditional Model 
of Interpreting Observations 
How can you identify a 
legitimate hip-hop artist 
(versus someone who just gets 
up and rhymes)? 
“If you have expert knowledge, 
then you are capable of 
answering complex questions 
by interpreting domain specific 
information.” [paraphrased]
Trust Models for complex data 
• August Gorman carried out a 
plot to grab fractions of a 
penny from a corporate payroll 
system. http://goo.gl/vAScel 
14 
IMDB: 4.9/10 
Rotten Tomatoes: 26/100
Trust Models for complex data 
• Peter Gibbons hatches a plot 
to write a computer virus that 
grab fractions of a penny from 
a corporate retirement 
account. http://goo.gl/rDg1U 
• Known in security circles as a 
salami attack. 
15 
IMDB: 7.9/10 
Rotten Tomatoes: 79/100 
Takeaway point: Little bits of value (information) 
provide deep insights in the aggregate
16 
1. Aggregation 
2. Visualization 
3. Correlation 
New Models of 
Interpreting (Big) Data 
Takeaways 
• Expert based knowledge is no 
longer sufficient. 
• Simple mathematical methods 
create value from captured data
17 
Aggregation 
(Counting) 
William Thomson, 1st Baron Kelvin 
"When you can measure 
what you are speaking 
about, and express it in 
numbers, you know 
something about it.” 
Takeaway 
• Aggregation via counting things 
is the most common way to 
exploit Big Data
Aggregation: 
A Tale of Two Products 
The book “Fearless” is much more popular than the 80s movie “Navy Seals.” 
It also has a more favorable distribution of reviews.
The distribution we’re looking for looks like the #1 hand: 
Responses concentrated in the most positive category, 
With very few responses that were unfavorable. 
Aggregation: 
A Tale of Two Products
Aggregation & Visualization: 
Counting with Google Trends
Aggregation & Visualization: 
Bing Search vs. Google Search
Aggregation: 
Diet Pepsi vs. Diet Coke
Aggregation & Visualization: 
Big Data vs. Britney Spears
Geospatial Visualization Example: 
Social Drift in DC 
Takeaway 
• Visualization provides a 
powerful mechanism for 
Exploratory Data Analysis 
A
25 
Correlation: 
Canadian Flu Research 
Gunther Eysenbach 
• Professor @ University of 
Toronto 
• Focused on eHealth 
•Google Ads user 
Infodemiology 
• 2004-2005 tracked flu related 
searches 
• 54,507 Ad impressions in 
Canada 
• High R^2 correlation to actual 
flu activity 
http://gunther-eysenbach.blogspot.com/ 
Infodemiology paper: http://goo.gl/aeUZtA 
Takeaway 
• Human behavior in response to 
Google Ads related to the flu was 
highly correlated with “officially 
reported” cases of the flu.
NYT: http://goo.gl/mNyAi7 
26 
Correlation: 
Google Flu Trends 
“Google Flu Trends provides near 
real-time estimates of flu activity 
for a number of countries and 
regions around the world based on 
aggregated search queries.” 
Process 
•Map searches to regions 
• Quantify “normal” 
• Detect “anomalies” 
NPR: http://goo.gl/Iv7A87
27 
Correlation: 
Box Office Hit Prediction 
“Use of socially generated ‘big 
data’ to access information 
about collective states of the 
minds in human societies has 
become a new paradigm in the 
emerging field of computational 
social science.” 
Simple factors 
• number of total page views 
• number of total edits made 
• number of users editing 
• number of revisions in the 
article's revision history 
Early Prediction of Movie Box Office Success: http://goo.gl/BWf7H1 
Counts of Wikipedia factors correlate to Box Office sales
28 
Big Data: 
Significance for Social Sciences 
1. Proxy variables. 
Digital exhaust collected for purposes other than survey often creates 
‘proxy variables’ that provide complementary insights. 
2. Aggregation Insights. 
Combining many small observations leads to insights that we can trust. 
3. Data Linking. 
It is possible to ‘link’ or synchronize records between digital exhaust and 
instrumented surveys by selecting a common dimension (e.g. location). 
The future of social science will involve combining 
“fuzzy Big Data insights” with instrumented survey results
Correlation Does Not Equal Causation 
http://xkcd.com/552/
The kitchen model of value creation 
Chef Ingredients Utensils Recipes 
Your 
Staff 
Your 
Data 
Technology Techniques
31 
Take Action: 
Experiment yourself 
Exploratory Data Analysis lifecycle: 
• collect - Twitter API, Datasift.com 
• clean - open refine 
• analyze - Python or R 
• visualize - Google Earth 
Related data: 
https://s3.amazonaws.com/devbackup/germany.txt.gz 
Related code: https://github.com/abeusher
32 
Take Action: Explore 
Google Trends http://goo.gl/8eJZg Google Ngram http://goo.gl/4U09fa 
Google Correlate http://goo.gl/nEhe8D Bing Keyword Research http://goo.gl/q2V88g
33 
Contact information 
Abe Usher 
Email: abe.usher@gmail.com 
Twitter: @abeusher 
LinkedIn: http://goo.gl/DUxZOP 
Presentations: http://goo.gl/bCa3Qt

More Related Content

What's hot

Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sites
Petter Bae Brandtzæg
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
Hakka Labs
 
The paper that forced Timnit Gebru out of Google
The paper that forced Timnit Gebru out of GoogleThe paper that forced Timnit Gebru out of Google
The paper that forced Timnit Gebru out of Google
LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP
 
eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...
eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...
eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...
David Bernstein
 
Semantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA ConSemantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA Con
Bill Slawski
 
Pinterest Facts, a starter deck from Factbrowser
Pinterest Facts, a starter deck from FactbrowserPinterest Facts, a starter deck from Factbrowser
Pinterest Facts, a starter deck from Factbrowser
Factbrowser2012
 
TX DSHS infographics webinar April 2014
TX DSHS infographics webinar April 2014TX DSHS infographics webinar April 2014
TX DSHS infographics webinar April 2014
Randy Krum
 
Opportunities with real time local search and content
Opportunities with real time local search and contentOpportunities with real time local search and content
Opportunities with real time local search and content
Sebastien Provencher
 
Digital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIDigital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part II
Ingmar Weber
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
Scott Edmunds
 
Future search search love - bill slawski
Future search   search love - bill slawskiFuture search   search love - bill slawski
Future search search love - bill slawski
Bill Slawski
 
Google, SEO and Personalized Search
Google, SEO and Personalized SearchGoogle, SEO and Personalized Search
Google, SEO and Personalized Search
Gianluca Fiorelli
 
Data journalism
Data journalism Data journalism
Data journalism
hamid obaidi
 
Data-driven postmortems - SVC206-S - Atlanta AWS Summit
Data-driven postmortems - SVC206-S - Atlanta AWS SummitData-driven postmortems - SVC206-S - Atlanta AWS Summit
Data-driven postmortems - SVC206-S - Atlanta AWS Summit
Amazon Web Services
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
Paul Bradshaw
 
googlization of information
googlization of informationgooglization of information
googlization of information
rajat00001in
 
William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-search
Bill Slawski
 
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
jeffreylancaster
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
News Leaders Association's NewsTrain
 
Rethink big data
Rethink big dataRethink big data
Rethink big data
Tony Shan
 

What's hot (20)

Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sites
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
 
The paper that forced Timnit Gebru out of Google
The paper that forced Timnit Gebru out of GoogleThe paper that forced Timnit Gebru out of Google
The paper that forced Timnit Gebru out of Google
 
eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...
eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...
eQuest-Big Data for HR Insights - Significant Changes Shown in Job Seeker Sea...
 
Semantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA ConSemantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA Con
 
Pinterest Facts, a starter deck from Factbrowser
Pinterest Facts, a starter deck from FactbrowserPinterest Facts, a starter deck from Factbrowser
Pinterest Facts, a starter deck from Factbrowser
 
TX DSHS infographics webinar April 2014
TX DSHS infographics webinar April 2014TX DSHS infographics webinar April 2014
TX DSHS infographics webinar April 2014
 
Opportunities with real time local search and content
Opportunities with real time local search and contentOpportunities with real time local search and content
Opportunities with real time local search and content
 
Digital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIDigital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part II
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
Future search search love - bill slawski
Future search   search love - bill slawskiFuture search   search love - bill slawski
Future search search love - bill slawski
 
Google, SEO and Personalized Search
Google, SEO and Personalized SearchGoogle, SEO and Personalized Search
Google, SEO and Personalized Search
 
Data journalism
Data journalism Data journalism
Data journalism
 
Data-driven postmortems - SVC206-S - Atlanta AWS Summit
Data-driven postmortems - SVC206-S - Atlanta AWS SummitData-driven postmortems - SVC206-S - Atlanta AWS Summit
Data-driven postmortems - SVC206-S - Atlanta AWS Summit
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
 
googlization of information
googlization of informationgooglization of information
googlization of information
 
William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-search
 
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
 
Rethink big data
Rethink big dataRethink big data
Rethink big data
 

Viewers also liked

Colocation Market 5 Year Meticulous Analysis and Competitive Landscape Report
Colocation Market 5 Year Meticulous Analysis and Competitive Landscape ReportColocation Market 5 Year Meticulous Analysis and Competitive Landscape Report
Colocation Market 5 Year Meticulous Analysis and Competitive Landscape Report
Pete Jones
 
Anticipatory Intelligence
Anticipatory IntelligenceAnticipatory Intelligence
Anticipatory Intelligence
Abe Usher
 
Taming Social Media with MongoDB
Taming Social Media with MongoDBTaming Social Media with MongoDB
Taming Social Media with MongoDB
HumanGeo Group
 
Taming Social Media
Taming Social MediaTaming Social Media
Taming Social Media
HumanGeo Group
 
Crowds, Computers, and Coordinates
Crowds, Computers, and CoordinatesCrowds, Computers, and Coordinates
Crowds, Computers, and Coordinates
Abe Usher
 
2012 humangeo smn
2012 humangeo smn2012 humangeo smn
2012 humangeo smn
Abe Usher
 
Heatmaps are the Heat
Heatmaps are the HeatHeatmaps are the Heat
Heatmaps are the Heat
Abe Usher
 
Advanced Web-Based Geospatial Visualization using Leaflet
Advanced Web-Based Geospatial Visualization using Leaflet Advanced Web-Based Geospatial Visualization using Leaflet
Advanced Web-Based Geospatial Visualization using Leaflet
HumanGeo Group
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
Abe Usher
 

Viewers also liked (9)

Colocation Market 5 Year Meticulous Analysis and Competitive Landscape Report
Colocation Market 5 Year Meticulous Analysis and Competitive Landscape ReportColocation Market 5 Year Meticulous Analysis and Competitive Landscape Report
Colocation Market 5 Year Meticulous Analysis and Competitive Landscape Report
 
Anticipatory Intelligence
Anticipatory IntelligenceAnticipatory Intelligence
Anticipatory Intelligence
 
Taming Social Media with MongoDB
Taming Social Media with MongoDBTaming Social Media with MongoDB
Taming Social Media with MongoDB
 
Taming Social Media
Taming Social MediaTaming Social Media
Taming Social Media
 
Crowds, Computers, and Coordinates
Crowds, Computers, and CoordinatesCrowds, Computers, and Coordinates
Crowds, Computers, and Coordinates
 
2012 humangeo smn
2012 humangeo smn2012 humangeo smn
2012 humangeo smn
 
Heatmaps are the Heat
Heatmaps are the HeatHeatmaps are the Heat
Heatmaps are the Heat
 
Advanced Web-Based Geospatial Visualization using Leaflet
Advanced Web-Based Geospatial Visualization using Leaflet Advanced Web-Based Geospatial Visualization using Leaflet
Advanced Web-Based Geospatial Visualization using Leaflet
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 

Similar to Big Data and the Social Sciences

1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
RahulTr22
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
sasi
 
Big Data World
Big Data WorldBig Data World
Big Data World
Hossein Zahed
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
Vincenzo Patruno
 
Interesting ways Big Data is used today
Interesting ways Big Data is used todayInteresting ways Big Data is used today
Interesting ways Big Data is used today
Daniel Sârbe
 
Capps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbagCapps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbag
Micah Altman
 
Injustice_Harm_Digital_Infrastructures.pptx
Injustice_Harm_Digital_Infrastructures.pptxInjustice_Harm_Digital_Infrastructures.pptx
Injustice_Harm_Digital_Infrastructures.pptx
MusfiqJony
 
Big data v4.0
Big data v4.0Big data v4.0
Big data v4.0
Ian Brown
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
eGov Innovation Center
 
Bigdata
BigdataBigdata
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
Luca Naso
 
Big Data for Trails
Big Data for TrailsBig Data for Trails
Big Data for Trails
L George
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
MarTech Conference
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
Padma Metta
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
DATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
Data Blueprint
 
Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...
Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...
Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...
Marcus Leaning
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
Fabio Stella
 

Similar to Big Data and the Social Sciences (20)

1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Interesting ways Big Data is used today
Interesting ways Big Data is used todayInteresting ways Big Data is used today
Interesting ways Big Data is used today
 
Capps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbagCapps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbag
 
Injustice_Harm_Digital_Infrastructures.pptx
Injustice_Harm_Digital_Infrastructures.pptxInjustice_Harm_Digital_Infrastructures.pptx
Injustice_Harm_Digital_Infrastructures.pptx
 
Big data v4.0
Big data v4.0Big data v4.0
Big data v4.0
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Bigdata
BigdataBigdata
Bigdata
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 
Big Data for Trails
Big Data for TrailsBig Data for Trails
Big Data for Trails
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...
Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...
Lecture 10 Inferential Data Analysis, Personality Quizes and Fake News...
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 

Recently uploaded

REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 

Recently uploaded (20)

REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 

Big Data and the Social Sciences

  • 1. Big Data Technology and the Social Sciences: A Lecture at Mannheim University Abe Usher CCHP, CISSP Chief Technology Officer, HumanGeo
  • 2. 2 What’s In It For You? Theory • Definitions and overview •Where data are being generated Practice • Google’s three secret techniques* for unlocking insights from data •The kitchen model •Recommended resources to build data science skills Presentation slides: http://www.slideshare.net/abeusher/big-data-and-the-social-sciences *Not specifically endorsed by Google. Also, not really a secret.
  • 3. 3 Background HumanGeo is focused on digital Human Geography:  Understanding the location attributes of individuals and groups  And the social attributes of locations  Through ‘Big Data’ analysis of billions geolocated data elements
  • 4. 4 Big Data Wake-Up Call Berkeley University Research http://goo.gl/zjSUr1 By 2016 the rate of data growth surpasses the rate of Moore’s Law
  • 5. 5 Defining Big Data http://knowyourmeme.com/memes/you-keep-using-that-word-i-do-not-think-it-means-what-you-think-it-means
  • 6. 6 Big Data Definition Boring Traditional definition “High volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”
  • 7. 7 Big Data Definition Abe’s definition:
  • 8. 8 The Original “Big Data” 1880 US Census • 50 million people •Data included: age, gender, number of insane people in household* •Took 7 years to tabulate • 1890 Census estimated at 13 years to complete *Credit to Ken Krugler for this factoid: http://www.censusrecords.com/content/1880_census
  • 9. 9 The Original “Big Data” 1880 US Census • 50 million people •Data included: age, gender, number of insane people in household* •Took 7 years to tabulate • 1890 Census estimated at 13 years to complete 1890 • 63 million people •Additional data: citizenship and military service •New technology: Hollerith Tabulating System •Took 6 weeks to tabulate (76x faster) Takeaway • Better technology and methodology led to 76x speedup *Credit to Ken Krugler for this factoid: http://www.censusrecords.com/content/1880_census
  • 10. 10 Data Generation Where are data created? •Website interaction logs •Social Media •Cyber events • Smartphones What is the volume? •3B phone calls in USA • 700M Facebook posts • 500M tweets per day • 50B WhatsApp messages per day Takeaway • Social media, telecommunication, and instant messaging generate an increasingly high volume of data
  • 11. 11 Traditional Model of Interpreting Observations Tracy Morrow (aka “Ice T”) How can you identify a legitimate hip-hop artist (versus someone who just gets up and rhymes)? http://www.npr.org/2005/08/30/4824690/original-gangster-rapper-and-actor-ice-t
  • 12. 12 Tracy Morrow (aka “Ice T”) Traditional Model of Interpreting Observations How can you identify a legitimate hip-hop artist (versus someone who just gets up and rhymes)? “Game knows game, baby.”
  • 13. 13 Tracy Morrow (aka “Ice T”) Traditional Model of Interpreting Observations How can you identify a legitimate hip-hop artist (versus someone who just gets up and rhymes)? “If you have expert knowledge, then you are capable of answering complex questions by interpreting domain specific information.” [paraphrased]
  • 14. Trust Models for complex data • August Gorman carried out a plot to grab fractions of a penny from a corporate payroll system. http://goo.gl/vAScel 14 IMDB: 4.9/10 Rotten Tomatoes: 26/100
  • 15. Trust Models for complex data • Peter Gibbons hatches a plot to write a computer virus that grab fractions of a penny from a corporate retirement account. http://goo.gl/rDg1U • Known in security circles as a salami attack. 15 IMDB: 7.9/10 Rotten Tomatoes: 79/100 Takeaway point: Little bits of value (information) provide deep insights in the aggregate
  • 16. 16 1. Aggregation 2. Visualization 3. Correlation New Models of Interpreting (Big) Data Takeaways • Expert based knowledge is no longer sufficient. • Simple mathematical methods create value from captured data
  • 17. 17 Aggregation (Counting) William Thomson, 1st Baron Kelvin "When you can measure what you are speaking about, and express it in numbers, you know something about it.” Takeaway • Aggregation via counting things is the most common way to exploit Big Data
  • 18. Aggregation: A Tale of Two Products The book “Fearless” is much more popular than the 80s movie “Navy Seals.” It also has a more favorable distribution of reviews.
  • 19. The distribution we’re looking for looks like the #1 hand: Responses concentrated in the most positive category, With very few responses that were unfavorable. Aggregation: A Tale of Two Products
  • 20. Aggregation & Visualization: Counting with Google Trends
  • 21. Aggregation & Visualization: Bing Search vs. Google Search
  • 22. Aggregation: Diet Pepsi vs. Diet Coke
  • 23. Aggregation & Visualization: Big Data vs. Britney Spears
  • 24. Geospatial Visualization Example: Social Drift in DC Takeaway • Visualization provides a powerful mechanism for Exploratory Data Analysis A
  • 25. 25 Correlation: Canadian Flu Research Gunther Eysenbach • Professor @ University of Toronto • Focused on eHealth •Google Ads user Infodemiology • 2004-2005 tracked flu related searches • 54,507 Ad impressions in Canada • High R^2 correlation to actual flu activity http://gunther-eysenbach.blogspot.com/ Infodemiology paper: http://goo.gl/aeUZtA Takeaway • Human behavior in response to Google Ads related to the flu was highly correlated with “officially reported” cases of the flu.
  • 26. NYT: http://goo.gl/mNyAi7 26 Correlation: Google Flu Trends “Google Flu Trends provides near real-time estimates of flu activity for a number of countries and regions around the world based on aggregated search queries.” Process •Map searches to regions • Quantify “normal” • Detect “anomalies” NPR: http://goo.gl/Iv7A87
  • 27. 27 Correlation: Box Office Hit Prediction “Use of socially generated ‘big data’ to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science.” Simple factors • number of total page views • number of total edits made • number of users editing • number of revisions in the article's revision history Early Prediction of Movie Box Office Success: http://goo.gl/BWf7H1 Counts of Wikipedia factors correlate to Box Office sales
  • 28. 28 Big Data: Significance for Social Sciences 1. Proxy variables. Digital exhaust collected for purposes other than survey often creates ‘proxy variables’ that provide complementary insights. 2. Aggregation Insights. Combining many small observations leads to insights that we can trust. 3. Data Linking. It is possible to ‘link’ or synchronize records between digital exhaust and instrumented surveys by selecting a common dimension (e.g. location). The future of social science will involve combining “fuzzy Big Data insights” with instrumented survey results
  • 29. Correlation Does Not Equal Causation http://xkcd.com/552/
  • 30. The kitchen model of value creation Chef Ingredients Utensils Recipes Your Staff Your Data Technology Techniques
  • 31. 31 Take Action: Experiment yourself Exploratory Data Analysis lifecycle: • collect - Twitter API, Datasift.com • clean - open refine • analyze - Python or R • visualize - Google Earth Related data: https://s3.amazonaws.com/devbackup/germany.txt.gz Related code: https://github.com/abeusher
  • 32. 32 Take Action: Explore Google Trends http://goo.gl/8eJZg Google Ngram http://goo.gl/4U09fa Google Correlate http://goo.gl/nEhe8D Bing Keyword Research http://goo.gl/q2V88g
  • 33. 33 Contact information Abe Usher Email: abe.usher@gmail.com Twitter: @abeusher LinkedIn: http://goo.gl/DUxZOP Presentations: http://goo.gl/bCa3Qt

Editor's Notes

  1. Aggregation is often the first and more important step in synthesizing facts and trends from a large pool of data. Correlation is useful in identifying related spatial features. Beware of spatial auto-correlation! Once a relationship has been quantified during a correlation step, the application of this numeric relationship can be used for course forecasting (anticipatory analysis).