This document summarizes a project investigating the use of big data to advance social science knowledge. It introduces the project leaders and discusses data sources and scope. It then focuses on defining big data, discussing how digital data represents real-world objects and phenomena, and the opportunities and limits this presents. Challenges of using big data to gauge public opinion are also examined, such as issues of representativeness, reliability, and replicability. The document concludes by listing project papers on this topic.
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Accessing Big Data to Advance Social Sciences
1. Accessing and Using Big Data to
Advance Social Science Knowledge
Eric Meyer, Ralph Schroeder, Linnet
Taylor and Josh Cowls
2. Overview
• Project Introduction (Eric)
– Papers
– Conferences and outreach
– Data sources
• Big data theory, definition, limits to
science and its uses (Ralph)
• Case study: using big data to gauge public
opinion (Josh)
3. Accessing and Using Big Data to Advance Social
Science Knowledge
• October 2012 – August 2014
• Funded by the Alfred P. Sloan Foundation
• Data sources
• 120+ interviews, mainly with social scientists
• Workshops and conferences
• No representative sample, but some patterns of
disciplinary and skills background and career
trajectory
4. Project scope to date
• Economics
• Business (Nemode project with Greg, Monica)
• Wikipedia
• Social Media
• Development
• Political Science
• Causation and Theory
• Policy
• (20+ blog posts on a range of other issues)
5. Outcomes and outreach
• A series of workshops hosted at the Oxford Internet Institute:
– 2013, March. Big Data: Rewards and Risks for the Social Sciences.
Oxford Internet Institute, Oxford.
http://www.oii.ox.ac.uk/events/?id=557
– 2013, March. Qualitative methods to study data practices: An
international comparative workshop. Oxford Internet Institute, Oxford
– 2013, January. Towards a Sociology of Data. Oxford Internet Institute,
Oxford. http://www.oii.ox.ac.uk/events/?id=560
– 2014, May. Big data and social change in the developing world,
Rockefeller Foundation Bellagio Centre conference
– 2014, August. American Political Science Association Panel on ‘Big
Data and Political Science’
• MSc option course ‘Big Data and Society’
6. Definition
• ‘Big data’
– the advance of knowledge via a leap in the scale
and scope in relation to a given object or
phenomenon
• ‘Data’
– Belongs to the object
– ‘taking…before interpreting’ (Ian Hacking)
• the view that ‘all data are of their nature interpreted’ is
misleading: ‘data are made, but as a good first
approximation, the making and taking come before
interpreting’
– The most atomizable useful unit of analysis
7. Digital Objects and their Referents
Digital Object
(Examples:
Twitter, Tesco
Loyalty card
information
Real World
(People /
Physical
Objects)
Represent / Manipulate
9. Uses and Limits
• Big data research uses (academic, commercial, government) are limited to
the exploitation of suitable objects, and the objects which ‘give off’ digital
data, and the phenomena they lay bare, are limited
• The knowledge produced is aimed at ‘sorting people’ and advancing
‘representing and intervening’ (but without ‘manipulating’, except where
this is warranted by practical economic and political objectives)
• Difference commercial versus academic world is that knowledge provides
competitive and practical advantage as against advancing (high-consensus
rapid-discovery) knowledge
– The limits in both cases are the objects (to which the data ‘belong’), and that need to
have available digitally manipulable data points
• How available these objects are differs, but also…
– Causation and theoretical embedding matters for academic social science
– For commercial (and non-academic uses), ‘predicting’ consumer choices and other
behaviours, for limited purposes and without increasing scientific knowledge, is good
enough
• There are many objects, for non-academics and scientists to humanities
scholars (physical, human, cultural), but they are not infinite
• This availability, not skills or other issues, determines the future of big data
research
10. (Big) data definition enables
pinpointing impacts and threats
• ‘Google Plus may not be much of a competitor to Facebook as
a social network, but…some analysts…say that Google
understands more about people’s social activity than
Facebook does.’
– New York Times, 15.2. 2014, p. A1 ‘The Plus in Google Plus? It’s Mostly for Google’.
• Facebook Likes: ‘Predicting users’ individual attributes and
preferences can beused to improve numerous products and
services. For instance, digital systems and devices (such as
online stores or cars) could be designed to adjust their
behavior to best fit each user’s inferred profile…online
insurance…advertisements might emphasize security when
facing emotionally unstable (neurotic) users but stress
potential threats when dealing with emotionally stable ones’
– ‘Private traits and attributes are predictable from digital records of human behavior.’ Kosinski M,
Stillwell D, Graepel T.,Proc Natl Acad Sci 2013 Apr 9;110(15):5802-5.
• More powerful knowledge will enable better services, and
more manipulation
11. Ethical and Social Issues in Big Data Research
• Objects with ‘total’ knowledge (universes)
– Danger is inferring behaviour not of individuals, but of classes of
people
• Asymmetry of knower and the subjects of knowledge is
greater than elsewhere
• Based not on individuals’ but on aggregate behaviour
– Hence only utilitarian, not Kantian justification?
• Why does prediction or uncovering laws of behaviour ‘grate’?
• Benefits: greater scientific power and more specific details
• Relation to smaller data? ‘Creep’
• Solution: ethical = greater researcher and public awareness,
regulatory (would apply to academic researchers?) = prevent
legal and specific harms
12. Outlook and Implications
• There is an overlap between real world research and
the world of academic research which is closer than
elsewhere
– because this is the research front in both
– because they share common objects
• For research
– Develop theoretical frame in which to embed big data (for
social media), including power/function, relation to
traditional media, and role in society
• For society
– Awareness of how research can generate transparency and
manipulability
• Big Brother?
– Yes, but also Brave New World of Omniscience, with Social
Science as Handmaiden
14. The traditional model
• The notion of public opinion
was enlivened by the coffee
houses of 17th Century Britain
• Inferential statistics provided a
rigorous, replicable basis for
reporting public opinion, based
on a random sample and MOE
• Remained expensive, random
sample difficult to construct,
response rates dwindling
15. Using big data approaches
• n=all: beyond the sample
• Cheaper (after initial investment)
• More granularity, more insight?
Cost
Utility
Traditional inferential model Big data model
17. The challenge of representativeness
≠
Amber Boydstun: anyone who does a Twitter study
has to really work hard, I've noticed, to justify why
we should care about Twitter because Twitter is not
representative of the United States population or
the world population, right. And it's not. And even
if it was representative, or even if we don't care that
it's not representative, it's really hard to figure out in
any given study whether you're getting an over-
sampling of those users who are just more active
than other users.
Data from Dutton, W.H. and Blank, G., with Groselj, D.
(2013) Cultures of the Internet: The Internet in Britain.
Oxford Internet Survey 2013. Oxford Internet Institute,
University of Oxford.
18. The challenge of reliability
Mike Thelwall: really the big problem that
we haven’t cracked is that if someone
tweets a sentiment it’s not necessarily what
they’re feeling, it can be for a variety of
reasons, so it doesn’t really reflect directly
what they feel necessarily … so it’s quite a
stretch to say that if someone tweets, “I’m
happy” that they’re actually happy, to give
a simple example
• Difficult to establish the meaning of latent messages
• Platform specific behaviours (e.g. hashtags, likes) are not
always understood
• Political discourse often laced with sarcasm
19. The challenge of replicability
• Social data is often proprietary – getting access can be
difficult, expensive or impossible
• Sometimes access is limited to output – analysis takes
inside black box
• Challenges basic Popperian assumption of falsifiability
Nick Anstead: there are all these
companies that do all this wonderful stuff,
but actually as an academic researcher,
using them is expensive … what do you
actually get from working with these
companies? Do you get raw data sets that
you go and do stuff with yourself? More
commonly, I would suggest, what you
probably get is access to, sort of, a black
box tool.
20. … and the implications aren’t just
academic
Shelton, T et al, ‘Mapping the data shadows of
Hurricane Sandy: Uncovering the sociospatial
dimensions of ‘big data’. Geoforum Volume 52,
March 2014, pps 167-79
21. Project Papers
Schroeder, Ralph (Forthcoming). ‘Big Data: Towards a More Scientific Social Science and Humanities’ in Mark Graham and William H
Dutton (eds.), Society and the Internet: How Networks of Information are Changing our Lives. Forthcoming.
Schroeder, Ralph, & Taylor, Linnet (Forthcoming). ‘Is bigger better? The emergence of big data as a tool for international development
policy.’ GeoJournal.
Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, August). ‘Big Data in the Study of Twitter, Facebook and Wikipedia: On the Uses
and Disadvantages of Scientificity for Social Research.’ Paper presented at the proceedings of the Annual Meeting of the American
Sociological Association. (being submitted)
Schroeder, Ralph, & Taylor, Linnet. ‘Big Data and Wikipedia Research: Social Science Knowledge across Disciplinary Divides’. Submitted to
Information, Communication and Society.
Taylor, Linnet. ‘No place to hide? The ethics and analytics of tracking mobility using African mobile phone data. Submitted to Population,
Space and Place.
Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet. ‘Big Data in the Social Sciences: Towards a New Research Paradigm?’ (being
submitted).
Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, November). ‘The Boundaries of Big Data.’ Paper presented at SIG-SI Symposium,
ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada.
Schroeder, Ralph and Cowls, Josh. ‘Answering Questions and Questioning Answers in the Era of Big Data.’ In preparation.
Taylor, Linnet, Meyer, Eric T., & Schroeder, Ralph. ‘Bigger and better, or more of the same? Emerging practices and perspectives on big
data analysis in economics”. Forthcoming in Big Data & Society.
Cowls, Josh. ‘The Crowd in the Cloud?’, forthcoming presentation and IPP 2014’
Cowls, Josh ‘Big Data and Policy Implementation’, in preparation.
Schroeder, Ralph ‘Big Data and Policy Implications’, in preparation.