• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data: Some Initial Reflectons

Big Data: Some Initial Reflectons



Slides from presentation to AHRC internal staff seminar, April 2014

Slides from presentation to AHRC internal staff seminar, April 2014



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Big Data: Some Initial Reflectons Big Data: Some Initial Reflectons Presentation Transcript

    • Professor Andrew Prescott, Theme Leader Fellow AHRC Digital Transformations Strategic Theme Big Data: Some Initial Reflections
    • • The Met Office currently generates about 20TB of data each day • ‘The problems which confront the meteorologist today will be faced by the humanities scholar within ten years’
    • • Large Hadron Collider: 600 million ‘collision events’ per second • One million jobs run by servers each day, with over 10 GB of data per second transferred at peak times • Approx. 20 petabytes of data produced annually • Over 70 universities involved in processing the data
    • http://www.flickr.com/photos/ibm_research_zurich/6777192080/in/set-72157629212636619
    • Whole brain imaging of neurone activity in a zebra fish, made using light sheet microscopy by Misha Ahrens and neuroscientists at the Howard Hughes Medical Institute. Each image comprises over 1 terabyte of data. Link: http://www.youtube.com/watch?feature=player_embedded&v=KE9mVEimQVU
    • • Some working definitions of big data • Big data exceeds the capacity of existing desktop machines and networks: you need help to deal with it • Data that is so large that existing methods of analysis simply don’t work: you have to change your methodology (probably to something quantitative) • Gartner definition: “Big data” is high- volume, -velocity and –variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
    • Examples of everyday big data of research value • Retail data generated by supermarkets • Online retail data: Amazon • Transport information: Oyster card • Hospital data • Data from utility companies • Social media
    • Visualisation of languages used in tweets in London in Summer 2012: Centre for Advanced Spatial Analysis, UCL: http://mappinglondon.co.uk/2012/londons-twitter-tongues/
    • Wolphram Alpha analytics of my Facebook friends
    • Analytic of my friend network
    • Does Big Data Yet Exist for the Humanities?
    • Letter of Gladstone to Disraeli, 1878: British Library, Add. MS. 44457, f. 166 The political and literary papers of Gladstone preserved in the British Library comprise 762 volumes containing approx. 160,000 documents.
    • George W. Bush Presidential Library: 200 million e-mails 4 million photographs
    • A Thousand Words: Advanced Visualisation in the Humanities Texas Advanced Computing Center Link: http://www.youtube.com/watch?v=kvOuJ2RwBTA
    • ‘Big data’ has already been an issue for linguists for many years
    • Another familiar example of big data in the humanities: censuses
    • Moving images and sound present some of the most challenging big data issue for arts and humanities
    • Archives and library catalogues as big data: Visible Archive browser: visiblearchive.blogspot.com
    • Visualisation by Jon Orwant of Google of Library of Congress subject categorisations of books published between 1600 and 2010: winedarksea.org
    • Commons Explorer: experimental interface to allow exploration of large quantities of images in Flickr Commons: http://mtchl.net/cex/
    • The Anglo-American Legal Tradition: web site holding seven million images of medieval legal records in the National Archives: www.aalt.law.uh.edu
    • Fabio Lattanzi Antinori,The Obelisk (2012): Open Data Institute: http://www.theodi.org/culture/obelisk-2012
    • Asia Trend Map: predicting popularity of games, manga and anime: www.asiatrendmap.jp
    • Some Big Data Issues • Research has historically been hypothesis- driven; is a more data-driven research required? • How valid are predictive and probabilistic techniques in arts and humanities research? • Data quality issues: do we lose a sense of the context and stratigraphy of the data? • Danger of thinking that data=truth
    • Digital Transformation theme and Big Data • Theme seeks to promote new research methods: using digital tools and materials to develop completely new type of scholarship • Additional funding of £4m has been allocated to work on big data • Following this workshop, call for big data projects will be issued • Smaller projects (up to £100k) • Larger projects (up to £600k)