Big Data for the Social Sciences


Published on

Expert talk at JISC Digifest, March 2014, Birmingham, UK

Published in: Social Media
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • EPSRC: Under ‘Big Data’ we are considering both very large and also complex data, including dynamic and heterogenous data from all the various sources including sensors, social media, industry etc.
  • Our research is underpinned by our extensive data holdings..Which range in scale from the micro- ... To the macro...And are derived from a wide range of sources, including...What make these data so special is that many of the data a very long-term,Our earliest data date-back to the 19th century...And are unique, irreplaceable!...providing an irreplaceable resource that helps provide early warnings of environmental change and places such changes in the correct historical context for decision makers…
  • Sloan Digital Sky Survey, the most ambitious astronomical survey ever undertaken, comprises 40 terabytes of information, while Steven Spielberg’s Survivors of the Shoah Visual History project comprises 200 terabytes.100 terabytes500 GB
  • ESRC was allocated 64m and much of this is being used to set up the ESRC Big Data Network. The ESRC’s Big Data Network will support the development of a network of innovative investments which will strengthen the UK’s competitive advantage in Big Data for the social sciences. The core aim of this network is to facilitate access to different types of data and thereby stimulate innovative research and develop new methods to undertake that research. Although you should note that diagram it is only illustrative in terms of how the UKDS and ADS will work across – that is still under discussion; and only illustrative in the number of Business and Local Government Data Research.This network has been divided into three phases. In Phase 1 of the Big Data Network the ESRC has invested in the development of the Administrative Data Research Network (ADRN) which will provide access to de-identified administrative data collected by government departments for research use – focus of this meeting and all your grants.A few words about Phase 2 and 3 before we pass to Vanessa to talk about the ADRN some more. Phase 2is currently bring commissioned and will deal primarily with business data and/ or local government data. Phase 3, further details of which will be released in the last autumn / winter and will focus primarily on third sector data and social media data. It is expected that there will be opportunities for interaction across all elements of the ESRC Big Data Network and that they will all work together around the wider objectives of facilitating access to different forms of data and of ensuring maximum impact is generated from the use of that data for the mutual benefit of data owners and researchers, and through the research facilitated by the Network, benefit society and the economy more generally.
  • Thanks to Simon Hettrick for additional input to this slide.
  • ESRC Cities Expert Group
  • Big Data for the Social Sciences

    1. 1. Big Data for the Social Sciences David De Roure, Strategic Adviser for Data Resources @dder
    2. 2. Big Data doesn‟t respect disciplinary boundaries Digital Social Research
    3. 3.
    4. 4. Mandy Chessell
    5. 5. The Big Picture More people Moremachines Big Data Big Compute Conventional Computation “Big Social” Social Networks e-infrastructure online R&D Big Data Production & Analytics deeply about society
    6. 6. RCUK and Big Data ▶ „Big data is a term for a collection of datasets so large and complex that it is beyond the ability of typical database software tools to capture, store, manage, and analyse them. „Big‟ is not defined as being larger than a certain number of „bytes‟ because as technology advances over time, the size of datasets that qualify as big data will also increase‟ (RCUK) ▶ But why do we want it? New forms of data enable us to 1. Answer existing research questions in new ways 2. Ask entirely new research questions
    7. 7. NERC Big Data diverse as our science • From micro- to macro-scale • Many sources: • Monitoring campaigns • Field sites & sensors • State-of-the-art laboratories • Ships & aircraft • Remote Sensing & EO • Regulator networks • Volunteers/citizen science • Model output • Long-term and unique! 10µm
    8. 8. 100 TB Big data: time-based media including film, tv, cctv footage - retail data - geospatial data - email and social media - images and associated metadata - performance data including raw data of recordings, choreography, performance structure - open government data - music - large-scale digital scans -
    9. 9. Research benefits of new data ▶ Undertaking research on pressing policy-related issues without the need for new data collection • Food consumption, social background and obesity • Energy consumption, housing type and climatic conditions • Rural location, private/public transport alternatives and incomes • School attainment, higher education participation, subject choices, student debt and later incomes ▶ New data such as social media enable us to ask big questions, about big populations, and in real time – this is transformative
    10. 10. Big Data Network
    11. 11. Phase 1 and 2
    12. 12. E-infrastructureLeadership
    13. 13. Mandy Chessell
    14. 14. F i r s t
    15. 15. Interdisciplinary and “in the wild” * * “in it” versus “on it”
    16. 16. Nigel Shadbolt et al
    17. 17. Real life is and must be full of all kinds of social constraint – the very processes from which society arises. Computers can help if we use them to create abstract social machines on the Web: processes in which the people do the creative work and the machine does the administration... The stage is set for an evolutionary growth of new social engines. The ability to create new forms of social process would be given to the world at large, and development would be rapid.Berners-Lee, Weaving the Web, 1999 (pp. 172–175) The Order of Social Machines
    18. 18. Some Social Machines SOCIAM: The Theory and Practice of Social Machines is funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EPJ017728/1 and comprises the Universities of Southampton, Oxford and Edinburgh. See
    19. 19. Edwards, P. N., et al. (2013) Knowledge Infrastructures: Intellectual Frameworks and Research Challenges. Ann Arbor: Deep Blue.
    20. 20. Web as lens Web as artefact Web Observatories
    21. 21. Big data elephant versus sense-making network? The challenge is to foster the co-constituted socio-technical system on the right i.e. a computationally-enabled sense- making network of expertise, data, models and narratives. Iain Buchan
    22. 22. Join the W3C Community Group Jun Zhao
    23. 23. Pip
    24. 24. Take homes ▶ New forms of data enable us answer old questions in new ways and to answer entirely new questions ▶ There are multiple shifts occurring: – Volumes of data – Realtime analytics – Computational infrastructure – Dataflows vs datasets (and curation infrastructure) – Correlation vs causation – Increasing automation – Machine-to-Machine in Internet of Things
    25. 25. @dder Slide and image credits: Fiona Armstrong, Christine Borgman, Iain Buchan, Mandy Chessell, Neil Chue Hong, Nigel Shadbolt, Pip Willcox, Jun Zhao, Guardian newspaper
    26. 26. @dder