2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data


Published on

Published in: Technology, Economy & Finance
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • There are countless models that can be applied to solve any one predictive analytics problem. It is impossible to know at the outset which technique will be most effective.
  • Many are academics who want access to real world data and problems
  • 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

    1. 1. The growing revolution in Data: A presentation to the Social Business Summit Nicholas Gruen Chairman, Kaggle Chairman, Government 2.0 Taskforce, E [email_address] T @nicholasgruen Singapore, 6 th April, 2011
    2. 2. Outline <ul><li>The changing landscape </li></ul><ul><ul><li>and what’s behind it </li></ul></ul><ul><li>The ecology of data </li></ul><ul><li>Finding the people to find the value in your data </li></ul><ul><ul><li>Kaggle </li></ul></ul><ul><li>From data inside to data outside </li></ul><ul><li>Data and gamification </li></ul><ul><ul><li>The Gruen Tender </li></ul></ul>
    3. 3. Data can turn things upside down <ul><li>Insurance </li></ul><ul><li>Retail </li></ul><ul><li>Banking </li></ul><ul><li>Telecommunications </li></ul><ul><li>Accommodation </li></ul><ul><li>Aviation and transport </li></ul><ul><ul><li>From stand by to advance purchase </li></ul></ul><ul><ul><li>load optimisation, price discrimination and risk sharing </li></ul></ul><ul><li>Medicine </li></ul>
    4. 5. All That Data… 3 years of historical data for comparison 10 x 750 x 50 x 52 x 3 = 58,500,000 data points 4 regions to segregate the data 10 x 750 x 50 x 52 x 3 x 7 x 4 = 1,638,000,000 data points 50 states to segregate the data 10 x 750 x 50 x 52 x 3 x 7 x 4 x 50 = 81,900,000,000 data points 7 types of data to monitor (POS, Inventory, Marketing, Syndicated, etc) 10 x 750 x 50 x 52 x 3 x 7 = 409,500,000 data points 8 categories to aggregate the data 10 x 750 x 50 x 52 x 3 x 7 x 4 x 50 x 8 = 655,200,000,000 data points 10 Retailers to monitor 10 data points 750 Stores per retailer to monitor 10 x 750 = 7500 data points 50 products per store to monitor 10 x 750 x 50 = 375,000 data points 52 weeks of data per year for trend analysis 10 x 750 x 50 x 52 = 19,500,000 data points 655 Billion+ data points involved with managing the retail sales channel Source: Marilyn and Terence Craig @ Strataconf
    5. 6. http://www.dlib.org/dlib/may09/mestl/05mestl.html
    6. 7. <ul><li>He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density in any point, and like the air in which we breathe, move, and have our physical being, incapable of confinement or exclusive appropriation.  </li></ul><ul><li>Thomas Jefferson to Isaac McPherson, August, 1813 </li></ul>Jefferson’s enlightenment dream
    7. 8. Public goods Public goods – goods that no-one will supply if the government doesn’t Public goods . . . present serious problems in human organisation. Vincent and Elenor Ostrom - 1977
    8. 9. <ul><li>The Wealth of Nations (1776) </li></ul><ul><li>Private Goods </li></ul><ul><li>The Theory of </li></ul><ul><li>Moral Sentiments (1759) </li></ul><ul><li>The social preconditions of markets (Public Goods) </li></ul>Language Adam Smith
    9. 10. Public Goods Private Goods [The public good of] Justice . . . is the main pillar that upholds the whole edifice. If it is removed, the great, the immense fabric of human society . . . must in a moment crumble into atoms. Adam Smith
    10. 11. From potential to actual public good
    11. 12. Web 2.0: explosion of emergent public goods <ul><li>Web 2.0 platforms are public goods: </li></ul><ul><ul><li>Google (1998) </li></ul></ul><ul><ul><li>Wikipedia (2001) </li></ul></ul><ul><ul><li>Blogs (early 2000s) </li></ul></ul><ul><ul><li>Facebook (2004) </li></ul></ul><ul><ul><li>Twitter (2006) </li></ul></ul><ul><li>Government didn’t build any of them </li></ul><ul><li>These platforms generate data </li></ul><ul><ul><li>By creating a context in which it means something </li></ul></ul><ul><ul><li>And so inducing us to produce it </li></ul></ul>
    12. 13. The economics of abundance: a new birth of ‘free’dom Public goods . . . present serious problems in human organisation. Vincent and Elenor Ostrom - 1977 The freedom of ideas is the liberation of our species Public goods as a problem Public goods as an opportunity
    13. 14. The ecology of data Data Schema or Context Information An example from Web 2.0 …
    14. 15. Private goods => Public Goods: Software <ul><li>Private Goods </li></ul><ul><li>Meeting private needs </li></ul>Public Goods <ul><li>Many eyeballs </li></ul><ul><li>Free code </li></ul>
    15. 16. Making sense of data <ul><li>Release and the sense makers will come </li></ul><ul><li>Make sense of the data for your community </li></ul><ul><ul><li>And you may be able to monetise it </li></ul></ul><ul><ul><li>Whole businesses being built on data exhaust </li></ul></ul><ul><li>Find the people to analyse your data </li></ul><ul><ul><li>Kaggle </li></ul></ul>
    16. 26. FlightCaster <ul><li>Predicts flight delays. </li></ul><ul><li>We use an advanced algorithm that scours data on every domestic flight for the past 10-years and matches it to real-time conditions. We help you evaluate alternative options and help connect you to the right person to make the change. </li></ul><ul><li>FlightCaster uses data from: </li></ul><ul><ul><li>Bureau of Transportation Statistics </li></ul></ul><ul><ul><li>FAA Air Traffic Control System Command Center </li></ul></ul><ul><ul><li>FlightStats </li></ul></ul><ul><ul><li>National Weather Service </li></ul></ul>
    17. 27. Private goods => Public Goods: Data <ul><li>Private Goods </li></ul><ul><li>Meeting private needs </li></ul><ul><li>Linking to other websites </li></ul>Public Goods <ul><li>Google uses this information to rank sites </li></ul><ul><li>Everyone benefits </li></ul>Google monetises with ads
    18. 28. <ul><li>Private Goods </li></ul><ul><li>Platform for recording data </li></ul>Public Goods <ul><li>PLM aggregates data and shares it back as public and private goods </li></ul>Sales of data
    19. 29. Data exhaust
    20. 30. Where’s Wally?
    21. 31. Global Competitions State of the art 70% 1½ weeks 70.8% Competition closes 77% Predicting HIV viral load Accuracy of Prediction (1 – 100%) <ul><li>Revenue or sales forecasts </li></ul><ul><li>Traffic forecasting </li></ul><ul><li>Energy demand </li></ul><ul><li>Predicting crime </li></ul><ul><li>Tax/social security fraud </li></ul><ul><li>Hospital casualty demand </li></ul><ul><li>Identifying great </li></ul><ul><ul><li>Teachers </li></ul></ul><ul><ul><li>Schools </li></ul></ul><ul><ul><li>Hospitals </li></ul></ul><ul><li>and their best practices </li></ul>US$500
    22. 32. We could not be happier with the result.  The Kaggle approach has set a new benchmark in Government for the development of successful predictive models, delivered quickly and very cost effectively.  In particular, the flexibility of the winning predictive model will enable its application to other major transport routes to the CBD and allow for the addition of other factors such as weather and incident. Susan Calvert Director, Strategy and Project Delivery Unit
    23. 34. Forecast Eurovision Voting Dr. Derek Gatherer, UK Take on the Quants 1 & 2 John Blatz Baltimore Edmund & Adrian London & USA Jason Trigg Pennsylvania Chih-Li Sung & Roy Tseng Penghu & Taipei Jure Zbontar Ljubljana Thomas Mahony Canberra Emir Delic Australia Glen Maher Canberra Predict HIV Chris Raimondi Batimore Claudio Perlich USA Gzegorz Swiszcz Gera Edmund & Adrian London & USA Tourism Forecasting Part 1 Rajstennaj Barrabas USA Jason Trigg Pennsylvania Felipe Maia Uppsala University Lee Baker Las Cruces, New Mexico INFORMS Cole Harris Texas Nan Zhou Pittsburgh Chess Ratings Uri Blass Tel-Aviv Giuseppe Ragusa Rome Robert Warsaw Tourism Forecasting Part 2 R Package Recommendation Engine Ivan Russian Federation The top 3 competitors for: Philipp Emanuel Widmann Heidelberg, DE Dr. Christopher Hefele, New York Chris DuBois Portland Where’s Wally? Where’s Jeremy? Chris Raimondi Baltimore Felipe Maia Uppsala University Jeremy Howard
    24. 35. Where’s Wally from?
    25. 36. What are Wally’s Qualifications
    26. 37. Rebuilding an info-structure <ul><li>Global CrisisCommons  </li></ul><ul><li>Within 2 hours of #CCearthquake </li></ul><ul><li>Global volunteers parse 300,000 tweets. </li></ul><ul><li>“ Shell 58 Barrack Rd out of petrol – only diesel ”. </li></ul><ul><li>Agencies fussed, helped and obstructed. </li></ul><ul><li>Kaggle comp to triage tweets </li></ul>
    27. 39. New routines to generate data Real estate or other sales Indicated Service provider
    28. 40. Medical procedure Indicated Service provider
    29. 41. Litigation Indicated Service provider
    30. 42. Gruen Tenders <ul><li>Forward looking data </li></ul><ul><li>Tailored to the specific case at hand </li></ul><ul><li>Enables innovation and data capture </li></ul><ul><li>Generate a mass of new data </li></ul><ul><li>Compares like with like </li></ul><ul><li>Minimises perverse incentives </li></ul>
    31. 43. E [email_address] T @nicholasgruen
    32. 44. The Public Goods of Web 2.0 SEO Google Page Rank
    33. 45. The Public Goods of Web 3.0 Ontologies followed - with tagging, for same reasons as SEO Ontologies created