SlideShare a Scribd company logo
1 of 34
Drinking from the fire hose?
The pitfalls & potential of Big Data
Josh Cowls, Oxford Internet Institute
with contributions from Eric Meyer, Ralph Schroeder and
Linnet Taylor
t2i Lab, Chalmers, 27th March 2014
Overview
• Background
• Definitions
• Innovations and implications
• Learning to drink from the fire hose
The Oxford Internet Institute
• Department of University of Oxford
• MO: ‘Understanding life online’
• Multi-disciplinary mix (social sciences plus physical and medical sciences,
and humanities)
• 45 researchers (and growing)
• 50 students (MSc Social Science of Internet; PhD programme)
• Generating big data on social, political and economic behaviour from
social media
www.oii.ox.ac.uk
• Funded by the Alfred P. Sloan Foundation
• 2012 – 2014
• Data sources:
• 120 interviews, mainly with social scientists but some
interviewees from business, government
• Reports, workshops, publications
• No representative sample, but some patterns of
disciplinary and skills background and career trajectory
NB where unattributed, quotes used in this presentation are excerpted from
interviews conducted as part of this project.
Accessing and Using Big Data to Advance Social
Science Knowledge
Big Data: our definition
Big data are data that are
unprecedented in scale and scope in
relation to a given phenomenon.
They are often streams of data (rather than fixed
datasets), accumulating large volumes, often at high
velocity.
Big Data: other definitions
• ‘Transactional’ (Margetts et al)
• ‘Things that one can do at a large scale that
cannot be done at a smaller one’ (Mayer-
Shonberger and Cukier)
• The ‘3 Vs’: volume, velocity, variety – but also
veracity, visualisability, viscosity? (Gartner)
... what Big Data isn’t
• A generalisable, quantifiable ‘amount’ of data
• A race to the top (Mutually Assured Distraction)
• The same for every discipline, field or sector
A ‘working’ definition
• The Big Data phenomenon might be less about
what the dataset is and more about how we
work with it
• (Even if this is indistinguishable in practice)
Shifts in mindset
From Mayer-Shonberger and Cukier:
• “The ability to analyse vast amounts of data
about a topic rather than be forced to settle for
smaller sets”
• “A willingness to embrace data’s real-world
messiness rather than privilege exactitude”
• “A growing respect for correlations rather than a
continuing quest for elusive causality”
Implications for research
Whither the sample?
“the sample survey[‘s] glory years ... are in the past”
Savage and Burrows, 2007
Implications for research
Whither the sample?
“sampling is like an analog photographic print. It looks good
from a distance, but as you stare closer, zooming in on a
particular detail, it gets blurry ... Often, the really interesting
things in life are found in places that samples fail to fully
catch”
Mayer-Shonberger and Cukier 2012
Implications for research
More or mess?
“social media is really, really fascinating, and the reason is
because it ... falls into this category of there’s something
there but we don’t know what it is. So you can measure
public opinion on Twitter and clearly that’s indicative of
something, but we don’t quite know what it’s representative
of”
Brandon Stewart, Harvard University Department of
Government
Implications for research
More or mess?
“the problem with the hashtag stuff [is that] we have
wonderful case studies but we don’t know what they sit in
essentially, what the framework is, if that’s 1% or 10% or
100% of the current conversation in Australia or whatever”
Axel Bruns, Queensland University of Technology
Implications for research
More or mess?
“the big problem that we haven’t cracked is that if
someone tweets a sentiment it’s not necessarily what
they’re feeling, it can be for a variety of reasons, so it doesn’t
really reflect what they feel necessarily”
Mike Thelwall, University of Wolverhampton
Implications for research
Do we care about causes?
“Big Data is all about correlation; it’s not about causation,
which means that you don’t need to have a theory
beforehand. You just start looking for correlation … so you
don’t have any idea about the structure of the data, you just
find a funny correlation.”
Sara Esposti, Open University Business School
Implications for research
Do we care about causes?
“a central concern of social science is, we don’t just want to
find statistical associations, we actually want to uncover the
underlying causal processes by which social systems work ...
The data themselves don’t tell you about cause and effect,
there’s actually a very complex often, complex inferential
process you have to go through in order to extract from the
data the things that you really care about
David Jensen, University of Massachusetts
Implications for research
Do we care about causes?
“I’ve been talking to some computer scientists who are
rising stars, they’re really doing well, and they acknowledge
that the way in which the field works, novelty is the key
issue. And so there’s always an incentive or a pressure to
keep on doing new stuff with new data, even though they
might have wanted to go into more depth into something.
Sandra Gonzalez-Bailon, Annenberg School of
Communication, University of Pennsylvania
The challenge
How can we extract meaning from Big Data – learn
to drink from the fire hose?
Drinking from the fire hose
• Understanding the data
• Collaborating
• Mixing methods
Drinking from the fire hose: understanding the data
The rise of the information society has given us
myriad new forms of data and accompanying ways
of analysing it.
The challenging part is abstracting meaning about
society in general from data created and harvested
online.
Drinking from the fire hose: understanding the data
Example: it’s hard to predict elections using Twitter
“[Of] 14 different attempts to predict elections
based on Twitter data ... Only half of them were
successful ... All of this looks close to mere chance”
Gayo-Avello 2012
Drinking from the fire hose: understanding the data
Example: Facebook isn’t going anywhere, and
neither is Princeton
Canarella and Spechler 2014 Develin 2014
Drinking from the fire hose: understanding the data
But it’s much simpler, conceptually speaking, to
analyse online phenomena on their own terms
Yasseri, Hale & Margetts 2013
Drinking from the fire hose: understanding the data
But it’s much simpler, conceptually speaking, to
analyse online phenomena on their own terms
Hale, Yasseri, Cowls, Meyer,
Schroeder & Margetts (submitted)
Drinking from the fire hose: understanding the data
Of course, online data can still provide insights into
offline life, but these must be well-grounded.
e.g. Seth Stephens-Davidowitz, ‘The Cost of Racial
Animus on a Black Candidate: Evidence Using
Google Data’
• Google accounts for >50% of search engine market (less
concern over representativeness)
• Google searches are private and anonymous (less
concern over reliability)
• This method uncovers a social phenomenon, racism,
which would be harder to detect in pre-Internet
approaches e.g. interviews or surveys
Drinking from the fire hose: understanding the data
Beware false prophets
XKCD
Drinking from the fire hose: understanding the data
Beware false prophets: analyses using thousands of
variables can generate millions or billions of
possible relationships – not all (or most) will be valid
or meaningful
Drinking from the fire hose: understanding the data
Beware false prophets
“if you look at the data long enough you’ll find predictive
signals that are in fact completely spurious...for about, I think
a 20 or 25 year period, the US stock market was perfectly
correlated with the level of butter production in Bangladesh
… if you look at hundreds and hundreds of these indicators,
whether it’s the level of Bangladesh butter production or the
number of cars in New York City or whatever it is, eventually
you'll find something that just by pure chance matches what
you're looking for. ”
Mike Cafarella, University of Michigan
Drinking from the fire hose: collaborating
Big data research often necessitates a wide variety
of skills and perspectives. The growth of teams in
academic research has been increasing for decades:
Drinking from the fire hose: collaborating
This trend is likely to persist as big data research
becomes more common
“the best research will often merge in collaboration
between computer scientists who will have access to the
tools and the background to further develop and apply
those, and with social scientists who will have, sort of, good
pressing social questions that we can get insight into with
the data that is now available. ”
Scott Hale, Oxford Internet Institute
Drinking from the fire hose: collaborating
This trend is likely to persist as big data research
becomes more common
“I can find someone to optimise an algorithm, I can pay
someone to build a website but what I want is someone that
is going to be thinking the human side through every step of
the way, and when you build an algorithm and when you
write a line of code you ask, does this make sense in terms of
the phenomena that I am trying to model or trying to
interpret.”
Josh Introne, Michigan State University
Drinking from the fire hose: mixing methods
While Big Data is necessarily quantitative, it can be
used in conjunction with other methods.
“For me, I think if I only look at the numbers I don’t get the
whole picture … if we look at, for example, Twitter data, you
can see some tendencies, but if you want to answer the right
question then I think it’s necessary to do more qualitative
studies … So I’m doing interviews with political parties, I’m
also doing interviews with journalists, in order to talk about
how they are using social media as journalistic tools. ”
Bente Kalsnes, University of Oslo
Drinking from the fire hose: mixing methods
This means correlations can point the way for
deeper causal explanatory research.
“So you start off with the patterns and then what you
should be doing is saying ‘Well, here’s some possible
reasons’, and then when you’ve found some relationships
which really deserve more study then you would go off and
do a more detailed qualitative assessment as to whether this
was true or not. . ”
Richard Webber, King’s College London
Conclusion: learning to drink from the fire hose
The major question around Big Data is not what the
data looks like and more about what we do with it.
The Big Data approach seems to challenge basic tenets
of academic research, undermining precision, validity
and explanatory power
However, with a greater understanding of the nature
of data, a collaborative approach and a willingness to
employ multiple methods, we’ll be better equipped to
drink from the Big Data fire hose.

More Related Content

What's hot

Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuSocial Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuCBS Competitiveness Platform
 
portfolio Mo and TIJUANA
portfolio Mo and TIJUANAportfolio Mo and TIJUANA
portfolio Mo and TIJUANAMuhammad Carvan
 
Explainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AIExplainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AIepsilon_tud
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaAi Sha
 
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...Jason Hong
 
press release final
press release finalpress release final
press release finalJeff Maehre
 
Measuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media InitiativesMeasuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media InitiativesMike Kujawski
 
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011Jason Hong
 
Research For Business Communication
Research For Business CommunicationResearch For Business Communication
Research For Business CommunicationAmber Prentiss
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingUXPA International
 
Shuhanhui zhuang desma9_midterm
Shuhanhui zhuang desma9_midtermShuhanhui zhuang desma9_midterm
Shuhanhui zhuang desma9_midtermMilton Zhuang
 
Inspiration Architecture: Oregon Virtual Reference Summit 2014
Inspiration Architecture: Oregon Virtual Reference Summit 2014Inspiration Architecture: Oregon Virtual Reference Summit 2014
Inspiration Architecture: Oregon Virtual Reference Summit 2014Peter Morville
 
Our kids and the digital utilities
Our kids and the digital utilitiesOur kids and the digital utilities
Our kids and the digital utilitiesFiras Dabbagh
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
Data Ethics for Mathematicians
Data Ethics for MathematiciansData Ethics for Mathematicians
Data Ethics for MathematiciansMason Porter
 
AGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORTAGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORTKumiko Sasa
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Jonathan Stray
 

What's hot (17)

Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuSocial Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
 
portfolio Mo and TIJUANA
portfolio Mo and TIJUANAportfolio Mo and TIJUANA
portfolio Mo and TIJUANA
 
Explainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AIExplainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AI
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer Phenomena
 
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
 
press release final
press release finalpress release final
press release final
 
Measuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media InitiativesMeasuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media Initiatives
 
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
 
Research For Business Communication
Research For Business CommunicationResearch For Business Communication
Research For Business Communication
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive Interviewing
 
Shuhanhui zhuang desma9_midterm
Shuhanhui zhuang desma9_midtermShuhanhui zhuang desma9_midterm
Shuhanhui zhuang desma9_midterm
 
Inspiration Architecture: Oregon Virtual Reference Summit 2014
Inspiration Architecture: Oregon Virtual Reference Summit 2014Inspiration Architecture: Oregon Virtual Reference Summit 2014
Inspiration Architecture: Oregon Virtual Reference Summit 2014
 
Our kids and the digital utilities
Our kids and the digital utilitiesOur kids and the digital utilities
Our kids and the digital utilities
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Data Ethics for Mathematicians
Data Ethics for MathematiciansData Ethics for Mathematicians
Data Ethics for Mathematicians
 
AGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORTAGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORT
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 

Viewers also liked

Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...stratosphere_eu
 
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"Greg Farrenkopf
 
Búsqueda secuencial en tabla ordenada
Búsqueda secuencial  en tabla ordenadaBúsqueda secuencial  en tabla ordenada
Búsqueda secuencial en tabla ordenadaEdwin Chavarria
 

Viewers also liked (6)

pemilih cerdas
pemilih cerdaspemilih cerdas
pemilih cerdas
 
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
 
Curso online legislacao especial para concursos
Curso online legislacao especial para concursosCurso online legislacao especial para concursos
Curso online legislacao especial para concursos
 
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Búsqueda secuencial en tabla ordenada
Búsqueda secuencial  en tabla ordenadaBúsqueda secuencial  en tabla ordenada
Búsqueda secuencial en tabla ordenada
 

Similar to Big Data Insights from Online Behavior

Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostelloData Con LA
 
Introduction to Computational Social Science
Introduction to Computational Social ScienceIntroduction to Computational Social Science
Introduction to Computational Social SciencePremsankar Chakkingal
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBig Data Week
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...The Higher Education Academy
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
The Case for Social Consumer Insights
The Case for Social Consumer InsightsThe Case for Social Consumer Insights
The Case for Social Consumer InsightsBrandwatch
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongMarTech Conference
 
Survey Research in Design
Survey Research in DesignSurvey Research in Design
Survey Research in DesignSam Ladner
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Biasgloriakt
 
Online Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachOnline Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachAsad Zaman
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...ARDC
 
Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?Charleston Conference
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 

Similar to Big Data Insights from Online Behavior (20)

Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Introduction to Computational Social Science
Introduction to Computational Social ScienceIntroduction to Computational Social Science
Introduction to Computational Social Science
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
Voices from the Field
Voices from the FieldVoices from the Field
Voices from the Field
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
The Case for Social Consumer Insights
The Case for Social Consumer InsightsThe Case for Social Consumer Insights
The Case for Social Consumer Insights
 
Taylor Ghost of Altmetrics Yet to Come
Taylor Ghost of Altmetrics Yet to ComeTaylor Ghost of Altmetrics Yet to Come
Taylor Ghost of Altmetrics Yet to Come
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
Survey Research in Design
Survey Research in DesignSurvey Research in Design
Survey Research in Design
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Bias
 
Online Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachOnline Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical Approach
 
Blink6 02 consumer_trackyourself
Blink6 02 consumer_trackyourselfBlink6 02 consumer_trackyourself
Blink6 02 consumer_trackyourself
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Finding the Story in the Data
Finding the Story in the DataFinding the Story in the Data
Finding the Story in the Data
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...
 
Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Big Data Insights from Online Behavior

  • 1. Drinking from the fire hose? The pitfalls & potential of Big Data Josh Cowls, Oxford Internet Institute with contributions from Eric Meyer, Ralph Schroeder and Linnet Taylor t2i Lab, Chalmers, 27th March 2014
  • 2. Overview • Background • Definitions • Innovations and implications • Learning to drink from the fire hose
  • 3. The Oxford Internet Institute • Department of University of Oxford • MO: ‘Understanding life online’ • Multi-disciplinary mix (social sciences plus physical and medical sciences, and humanities) • 45 researchers (and growing) • 50 students (MSc Social Science of Internet; PhD programme) • Generating big data on social, political and economic behaviour from social media www.oii.ox.ac.uk
  • 4. • Funded by the Alfred P. Sloan Foundation • 2012 – 2014 • Data sources: • 120 interviews, mainly with social scientists but some interviewees from business, government • Reports, workshops, publications • No representative sample, but some patterns of disciplinary and skills background and career trajectory NB where unattributed, quotes used in this presentation are excerpted from interviews conducted as part of this project. Accessing and Using Big Data to Advance Social Science Knowledge
  • 5. Big Data: our definition Big data are data that are unprecedented in scale and scope in relation to a given phenomenon. They are often streams of data (rather than fixed datasets), accumulating large volumes, often at high velocity.
  • 6. Big Data: other definitions • ‘Transactional’ (Margetts et al) • ‘Things that one can do at a large scale that cannot be done at a smaller one’ (Mayer- Shonberger and Cukier) • The ‘3 Vs’: volume, velocity, variety – but also veracity, visualisability, viscosity? (Gartner)
  • 7. ... what Big Data isn’t • A generalisable, quantifiable ‘amount’ of data • A race to the top (Mutually Assured Distraction) • The same for every discipline, field or sector
  • 8. A ‘working’ definition • The Big Data phenomenon might be less about what the dataset is and more about how we work with it • (Even if this is indistinguishable in practice)
  • 9. Shifts in mindset From Mayer-Shonberger and Cukier: • “The ability to analyse vast amounts of data about a topic rather than be forced to settle for smaller sets” • “A willingness to embrace data’s real-world messiness rather than privilege exactitude” • “A growing respect for correlations rather than a continuing quest for elusive causality”
  • 10. Implications for research Whither the sample? “the sample survey[‘s] glory years ... are in the past” Savage and Burrows, 2007
  • 11. Implications for research Whither the sample? “sampling is like an analog photographic print. It looks good from a distance, but as you stare closer, zooming in on a particular detail, it gets blurry ... Often, the really interesting things in life are found in places that samples fail to fully catch” Mayer-Shonberger and Cukier 2012
  • 12. Implications for research More or mess? “social media is really, really fascinating, and the reason is because it ... falls into this category of there’s something there but we don’t know what it is. So you can measure public opinion on Twitter and clearly that’s indicative of something, but we don’t quite know what it’s representative of” Brandon Stewart, Harvard University Department of Government
  • 13. Implications for research More or mess? “the problem with the hashtag stuff [is that] we have wonderful case studies but we don’t know what they sit in essentially, what the framework is, if that’s 1% or 10% or 100% of the current conversation in Australia or whatever” Axel Bruns, Queensland University of Technology
  • 14. Implications for research More or mess? “the big problem that we haven’t cracked is that if someone tweets a sentiment it’s not necessarily what they’re feeling, it can be for a variety of reasons, so it doesn’t really reflect what they feel necessarily” Mike Thelwall, University of Wolverhampton
  • 15. Implications for research Do we care about causes? “Big Data is all about correlation; it’s not about causation, which means that you don’t need to have a theory beforehand. You just start looking for correlation … so you don’t have any idea about the structure of the data, you just find a funny correlation.” Sara Esposti, Open University Business School
  • 16. Implications for research Do we care about causes? “a central concern of social science is, we don’t just want to find statistical associations, we actually want to uncover the underlying causal processes by which social systems work ... The data themselves don’t tell you about cause and effect, there’s actually a very complex often, complex inferential process you have to go through in order to extract from the data the things that you really care about David Jensen, University of Massachusetts
  • 17. Implications for research Do we care about causes? “I’ve been talking to some computer scientists who are rising stars, they’re really doing well, and they acknowledge that the way in which the field works, novelty is the key issue. And so there’s always an incentive or a pressure to keep on doing new stuff with new data, even though they might have wanted to go into more depth into something. Sandra Gonzalez-Bailon, Annenberg School of Communication, University of Pennsylvania
  • 18. The challenge How can we extract meaning from Big Data – learn to drink from the fire hose?
  • 19. Drinking from the fire hose • Understanding the data • Collaborating • Mixing methods
  • 20. Drinking from the fire hose: understanding the data The rise of the information society has given us myriad new forms of data and accompanying ways of analysing it. The challenging part is abstracting meaning about society in general from data created and harvested online.
  • 21. Drinking from the fire hose: understanding the data Example: it’s hard to predict elections using Twitter “[Of] 14 different attempts to predict elections based on Twitter data ... Only half of them were successful ... All of this looks close to mere chance” Gayo-Avello 2012
  • 22. Drinking from the fire hose: understanding the data Example: Facebook isn’t going anywhere, and neither is Princeton Canarella and Spechler 2014 Develin 2014
  • 23. Drinking from the fire hose: understanding the data But it’s much simpler, conceptually speaking, to analyse online phenomena on their own terms Yasseri, Hale & Margetts 2013
  • 24. Drinking from the fire hose: understanding the data But it’s much simpler, conceptually speaking, to analyse online phenomena on their own terms Hale, Yasseri, Cowls, Meyer, Schroeder & Margetts (submitted)
  • 25. Drinking from the fire hose: understanding the data Of course, online data can still provide insights into offline life, but these must be well-grounded. e.g. Seth Stephens-Davidowitz, ‘The Cost of Racial Animus on a Black Candidate: Evidence Using Google Data’ • Google accounts for >50% of search engine market (less concern over representativeness) • Google searches are private and anonymous (less concern over reliability) • This method uncovers a social phenomenon, racism, which would be harder to detect in pre-Internet approaches e.g. interviews or surveys
  • 26. Drinking from the fire hose: understanding the data Beware false prophets XKCD
  • 27. Drinking from the fire hose: understanding the data Beware false prophets: analyses using thousands of variables can generate millions or billions of possible relationships – not all (or most) will be valid or meaningful
  • 28. Drinking from the fire hose: understanding the data Beware false prophets “if you look at the data long enough you’ll find predictive signals that are in fact completely spurious...for about, I think a 20 or 25 year period, the US stock market was perfectly correlated with the level of butter production in Bangladesh … if you look at hundreds and hundreds of these indicators, whether it’s the level of Bangladesh butter production or the number of cars in New York City or whatever it is, eventually you'll find something that just by pure chance matches what you're looking for. ” Mike Cafarella, University of Michigan
  • 29. Drinking from the fire hose: collaborating Big data research often necessitates a wide variety of skills and perspectives. The growth of teams in academic research has been increasing for decades:
  • 30. Drinking from the fire hose: collaborating This trend is likely to persist as big data research becomes more common “the best research will often merge in collaboration between computer scientists who will have access to the tools and the background to further develop and apply those, and with social scientists who will have, sort of, good pressing social questions that we can get insight into with the data that is now available. ” Scott Hale, Oxford Internet Institute
  • 31. Drinking from the fire hose: collaborating This trend is likely to persist as big data research becomes more common “I can find someone to optimise an algorithm, I can pay someone to build a website but what I want is someone that is going to be thinking the human side through every step of the way, and when you build an algorithm and when you write a line of code you ask, does this make sense in terms of the phenomena that I am trying to model or trying to interpret.” Josh Introne, Michigan State University
  • 32. Drinking from the fire hose: mixing methods While Big Data is necessarily quantitative, it can be used in conjunction with other methods. “For me, I think if I only look at the numbers I don’t get the whole picture … if we look at, for example, Twitter data, you can see some tendencies, but if you want to answer the right question then I think it’s necessary to do more qualitative studies … So I’m doing interviews with political parties, I’m also doing interviews with journalists, in order to talk about how they are using social media as journalistic tools. ” Bente Kalsnes, University of Oslo
  • 33. Drinking from the fire hose: mixing methods This means correlations can point the way for deeper causal explanatory research. “So you start off with the patterns and then what you should be doing is saying ‘Well, here’s some possible reasons’, and then when you’ve found some relationships which really deserve more study then you would go off and do a more detailed qualitative assessment as to whether this was true or not. . ” Richard Webber, King’s College London
  • 34. Conclusion: learning to drink from the fire hose The major question around Big Data is not what the data looks like and more about what we do with it. The Big Data approach seems to challenge basic tenets of academic research, undermining precision, validity and explanatory power However, with a greater understanding of the nature of data, a collaborative approach and a willingness to employ multiple methods, we’ll be better equipped to drink from the Big Data fire hose.