Why Big Data Will Survive the Hype - and Change the Way We Work


Published on

This deck accompanied my presentation on big data at the Digital Analytics Association NYC Symposium on 12.4.13.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • No strict definition of the term – merely refers to the process (or capability) of analyzing datasets so large that they couldn’t previously fit into computer memory. This is where we got Google MapReduce and Hadoop. Technology companies who pioneered these techniques thus were able to extract unique new value from huge troves of data that many “offline” companies in a wide number of sectors had kept for years.
  • Today, up to a third of Amazon’s online revenue is derived from its personalization and recommendations engine.Case studiesYou can cite any number of case studies about how innovative companies have been able to extract new value from large, previously unremarkable datasets. But in any of these cases, what we see is that data has become the newest natural resource, and it’s being exploited to create new markets.
  • Interestingly, guess how many companies have a line item on their balance sheets for “data?” None. FB is one of the single best examples of this mismatch between traditional systems of financial value and new ones. Intangible assets 40% of value of public companies in 1980s; 75% of their value in 2010s
  • As human societies consume, generate and process more data, our political, legal and conceptual models must change along with them. While it took hundreds of years for mass literacy and printed information to change Western civilization, we are now living in an era where amounts of and access to data are completely unprecedented. It will change how we think about the nature of information itself.8M books printed from 1453 to 1503Hollerith shrunk tabulating times for the U.S. Census from 8 years to <1.
  • Interestingly, guess how many of the companies listed here have a line item on their balance sheets for “data?” None.
  • Collecting more data, more often frequently means sacrificing some level of precision. At large scale, accepting some noise – messiness – in exchange for collecting a larger dataset can mean better predictive power.NoSQL
  • IBM 701 Machine – punch card system. Translated 60 sentences smoothly.IBM Candide – ten years worth of Canadian parliamentary transcripts. Ultimately was difficult to scale due to lack of additional data.Google Translate uses billions of websites, book-scanning project. In 2013, covers more than 60 languages.
  • Sampling is sometimes a definitional characteristic of what qualifies as “big data” – whether we’re querying an entire dataset rather than a select part of it.Sampling is still very useful sometimes, but always as a second-best alternative to querying an entire dataset. Artifact of data-constrained environment where storage and processing power was sharply limited
  • Up to a third of all Amazon’s sales are a result from its recommendation and personalization engines. These product-to-product correlations matter far more than understanding WHY customers who buy one product like another.
  • Why Big Data Will Survive the Hype - and Change the Way We Work

    1. 1. (and live) (and think) The City University of New York New York City December 4, 2013 @BlairReeves
    2. 2. Blair Reeves Product Lead, IBM Digital Analytics IBM.com/digitalmarketing I live here: Durham, North Carolina @BlairReeves
    3. 3. “The Year of Big Data” Credit: Gartner Research … is every year. From now on. @BlairReeves
    4. 4. The Value of Data is Increasing @BlairReeves
    5. 5. The Value of Data … is still being decided Book Value: $13 billion Market Value: $114 billion = 1.3 billion MAUs ~500 terabytes of data added… per day $101 billion in data @BlairReeves
    6. 6. A Short History of Data 300 B.C. Great Library of Alexandria (Egypt) 970 A.D. Al-Azhar University (Egypt) 1400 Cambridge University owns 122 books 1450s Invention of the Gutenberg printing press 1520s Martin Luther translates the Latin Bible, accelerating mass literacy 1710 Copyright law is born 1770s Press freedom guarantees; pamphleteering 1890 Herman Hollerith invents machine-readable data for U.S. Census 1969 ARPANET – first TCP/IP Protocol 2013 Watson ~2.8 billion global internet users (40% of world’s population) @BlairReeves
    7. 7. The Way We Use Data Will Change Trade Exactitude for Size Why Sample? Correlation Over Causality @BlairReeves
    8. 8. 1 – Trade Exactitude for Size Precision < Size More data > Better algorithms @BlairReeves
    9. 9. 1 – Trade Exactitude for Size 1954 1990 250 word pairs 2006 3 million word pairs >100 billion word pairs (and counting) @BlairReeves
    10. 10. 2 – Why Sample? • Sampling relies on randomness • Difficult to drill down into subcategories • Requires careful pre-planning @BlairReeves
    11. 11. 2 – Why Sample? • Sumo wrestlers • Google Flu • Non-linear relationships (social media) @BlairReeves
    12. 12. 3 – Correlation Over Causality When does knowing “why” matter? Data rather than hypotheses Correlations are value @BlairReeves
    13. 13. 3 – Correlation Over Causality A/B Testing Attribution @BlairReeves
    14. 14. “Everything is obvious once you know the answer.” - Duncan Watts @BlairReeves
    15. 15. Thanks! BReeves@us.ibm.com @BlairReeves IBM.com/digitalmarketing IBMBigDataHub.com